




Introduction to 


MATHEMATICAL 

PROBABILITY 

) 

J. V. Uspensky 

Late professoi' qf mathematics, Stanford University 


McGRAW-HILL BOOK COMPANY, Inc. 
New York Toronto London 



Copyright, 1937, by the 
McGraw-Hill Book Company, Inc. 
“Copyright renewed 1965 by Lucille Zander Uspensky.” 

PRINTED IN THE UNITED STATES OF AMBRIOA 

All rights reserved. This bookf or 
parts thereof f may not he reproduced 
in any form without permission of 
the publishers 


4 5 6 7 8 9 0 MFC 75 74 73 72 71 70 69 68 67 66 



PREFACE 


This book is an outgrowth of lectures on the theory of probability 
which the author has given at Stanford University for a number of 
years. At first a short mimeographed text covering only the elementary 
parts of the subject was used for the guidance of students. As time 
went on and the scope of the course was gradually enlarged, the necessity 
arose of putting into the hands of students a more elaborate exposition 
of the most important parts of the theory of probability. Accordingly 
a rather larg^ manuscript was prepared for this purpose. The author 
did not plan at first to publish it, but students and other persons who had 
opportunity to peruse the manuscript were so persuasive that publication 
was finally arranged. 

The book is arranged in such a way that the first part of it, consisting 
of Chapters I to XII inclusive, is accessible to a person without advanced 
mathematical knowledge. Chapters VII and VIII are, perhaps, excep¬ 
tions. The analysis in Chapter VII is rather involved and a better way 
to arrive at the same results would be very desirable. At any rate, a 
reader who does not have time or inclination to go through all the 
intricacies of this analysis may skip it and retain only the final results, 
found in Section 11. Chapter VIII, though dealing with interesting 
and historically important problems, is not important in itself and may 
without loss be omitted by readers. Chapters XIII to XVI incorporate 
the results of modern investigations. Naturally they are more complex 
and require more mature mathematical preparation. 

Three appendices are added to the book. Of these the second is by 
far the most important. It gives an outline of the famous Tshebysheff- 
Markoff method of moments applied to the proof of the fundamental 
theorem previously established by another method in Chapter XIV. 

No one will dispute Newton's assertion: ‘‘In scientiis addiscendis 
exempla magis prosunt quam praecepta." But especially is it so in the 
theory of probability. Accordingly, not only are a large number of 
illustrative problems discussed in the text, but at the end of each chapter 
a selection of problems is added for the benefit of students. Some of 
them are mere examples. Others are more difficult problems, or even 
important theorems which did not find a place in the main text. In all 
such cases sufficiently explicit indications of solution (or proofs) are given. 



vi PREFACE 

The book does not go into applications of probability to other sciences. 
To present these applications adequately another volume of perhaps 
larger size would be required. 

No one is more aware than the author of the many imperfections in 
the plan of this book and its execution. To present an entirely satis¬ 
factory book on probability is, indeed, a difficult task. But even with 
all these imperfections we hope that the book will prove useful, especially 
since it contains much material not to be found in other books on the 
same subject in the English language. 

J. V. Uspensky. 

Stanford University, 

Septembert 1937< 
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ideas, new analytic methods, and new results, in all fairness should be 
regarded as one of the most outstanding contributions to mathematical 
literature. It exercised a great influence on later writers on probability 
in Europe, whose work chiefly consisted in elucidation and development 
of topics contained in Laplace's book. 

Thus in European countries further development of the theory 
probability was somewhat retarded. But the subject took on important 
developments in the works of Russian mathematicians: Tshebysheff 
(1821-1894) and his former students, A. Markoff (1856-1922) and A. 
Liapounoff (1868-1918). Castelnuovo in his fine book ^‘Calcolo delle 
probability" rightly regards the contributions to the theory of probability 
due to Russian mathematicians as the most important since the time of 
Laplace. 

At the present time interest in the theory of probability is revived 
everywhere, but again the most outstanding recent contributions have 
been made in Russia, chiefly by three prominent mathematicians: S. 
Bernstein, A. Khintchine, and A. Kolmogoroff. 

In closing this introduction it seems proper to quote the closing 
words of the ‘^Essai philosophique sur les probabilitAs": 

On voit par cet Essai, que la th^orie des probabilitAs n’ est au fond, que le bon 
sens r4duit au calcul: elle fait appr4cier avec exactitude, ce que les Asprits justes 
sentent par une sorte d’instinct, sans quails puissent souvent s’en rendre compte. 
Elle ne laisse rien d’arbitraire dans le choix des opinions et des partis y prendre, 
toutes les fois que Von peut, k son moyen, determiner le choix le plus avantageux. 
Par la, elle devient le supplement le plus heureux, k Tignorance et k la faiblesse 
de resprit humain. Si I’on considdre les methodes analytiques auxquelles cette 
theorie a donne naissance, la verite des principes qui lui servent de base, la 
logique fine et delicate qu’ exige leur emploi dans la solution des probiemes, les 
etablissements d’utilite publique qui s’appuient sur elle, et I’extension qu’elle a 
regue et qu’elle peut re^evoir encore, par son application aux questions les plus 
importantes de la Philosophie naturelle et des sciences morales; si Ton observe 
ensuite, que dans les choses memes qui ne peuvent dtre soumise au calcul, elle 
donne les apergus les plus sfirs qui puissent nous guider dans nos jugements, 
et qu’elle apprend k se garantir des illusions qui souvent nous 4garent; on verra 
qu’il n’est point de science plus digne de nos meditations. 
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CHAPTER I 


COMPUTATION OF PROBABILITIES BY DIRECT 
ENUMERATION OF CASES 

1. The probability of an event can be found by direct application 
of the definition when it is possible to make a complete enumeration of 
all equally likely cases, as well as of those favorable to that event.^^ Here 
we shall consider a few problems, beginning with the simplest, to illustrate 
th» direct method of evaluating probabilities. 

1. Two dice are thrown. What is the probability of 
obtaining a total of 7 or 8 points? 

Solution. Suppose we distinguish the dice by the numbers 1 and 2. 
There are 6 possible cases as to the number of points on the first die; 


and each of these cases can be accompanied by any of the 6 poi^ible 
numbers of points on the second die. Hence, we can distinguish alto¬ 
gether 6 X 6 » 36 different cases. Provided the dice are ideally regular 
in shape and perfectly homogeneous, we have good reason to consider 
these 36 cases as equally likely, and we shall so consider them. 

Next, let us find out how many cases are favorable to the tolkl of 
7 points. This may happen only in the following ways: 


First Die 
1 
2 

3 

4 

5 

6 

likewise, for 8 points: 

First Die 
2 
3 

5 

6 


V. 


Second Die 
6 
5 
4 
3 
2 

1 , 


Second Die 
6 
5 
4 
3 
2 


That is, out of the total number of 36 cases there are 6 cases favorable 
^to 7.points and 5 cases favorable to 8 points; hence, the probability of 
'^obwning 7 points is and the probability of obtaining 8 points is 

Problem 2. A coin is tossed three times in ^ succession. What 
is the of 

obtaining ta& atlSust'OiiceT Lz ^ 

r -—14 
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Solution. In the first throw there are two possible cases^-^hei&h or " 
tails. And if . the coin is unbiased (which^ we assume is^ true)Hhese two 
cases 'may be considered as equally likely. In two throws theie afS^ 
2.x 2 — 4 ca£^,* namely; both of the two (fossible cases in the first toss 
can combine with both of the possible cases in the ^secolid. Similarly, 
in three throws the number of cas^ will be 2 X 2 X 2* = To 
the number of cases favorable to omaining 2 heads, we niust consider 
that this can happen only in three ways: ^I 


Heads Tails V, 
Heads Tails ‘ He&ds 
Tails Heads Heads 




The number of favorable cases'being 3, the probability of obtaining ^ 
two heads is %, 

To answer the second part of the question, we observe that there is 
only one case when tails does not turn Up. Therefore, the number of 
cases favorable to obtaining tails at least once is 8 — 1 = 7, so that 
the required probability is 

^/Problem 3. Two cards are drawn frofti a deck of well-shuffled-., 
cards. What is the probability that both the extracted cards are 
aces? 

Solution. Since there are 52 cards in the deck, there are 52 ways 
of extracting the first cai^d. After the first card has been withdrawn, 
the second extracted card may be one of the remaining 51 cards. Ther^ 
fore, the total number of ways to draw two cards is 52 X " All these 
cases may be considered as equally likely. 

To find the number of cases favorable to drawing aces, , we observe 
that there are 4^aces^4fcerefore, there are 4 ways to get the first ace. 
After it has been extracted, there are 3 ways to get a second ace. Hence, 
the total number of ways to draw 2 aces, is 4 X 3/and thcr required ' 
probability is: " ' 


y^ii 


4-X3 1 1 

62 X 51 13 X 17 221' 


oblem 4. Two cards are drawn from a full paok^ the first card 
being returned to the pack before the^ second is taken. ' What is the 
probabill(P that both the extracted cards belong to a specified sui ^ 
Solution. There are 52 w^ys of getting the first card. For the 
^second drawing, there are also 52 ways, because by returning the first 
extracted to the pack, the ori^nal number was restored. Under 
«uch cihs^stahees, the total number of ways to extract two cards lb 
X 52. Now, because there are 13 card s in a suH, the number of 
cttfies favorable to obtaining two cards'^ a specified suit is 13 X 13. 
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Therefore, the required probability is given by: ' 

13 X 13 _ 1 X 1 _ 1 
62 X 52 4 X 4 16* ^ 

^4. Problem 6. An urn contains 3 white and 5 black balls. One 
is drawn. What is the probability that it is black? 

\ Solution. The total number of balls is 8. To distinguish them, we 
may imagine that they are numbered. As to the number on the ball 
drawn, there are 8 possible cases that may reasonably be considered as 
equally likely. Obviously, there are 5 cases favorable to the black color 
of the ball drawn. Therefore, the required probability is 

By a slight modification of the last problem, we come to the following 
Interesting situation: 

, Problems. The conten ts of the urn a r£_the s ame as in the foregoing 
problem. But th&Jim^jK£LSuppose that, one ballis-drawmand, its color 
unn oted^ laid ns i3e. Then-another-bait 4s drawn, and we ate required to 
find thfiLprobability that it is black or white. 

Solution. Suppose again that the balls are numbered, so that the 
white balls bear numbers 1, 2, and "3; and the black balls bear numbers 
4, 6, 6, 7, 8. Obviously, there are 8 ways to get the first ball, and what¬ 
ever it is, there remain only 7 ways to get the second ball^^ The total 
number of equally likely cases is 8 X 7 = 56^ 

It is a littje more difficult^to fiiid the number of cases favorable to 
extracting a white or black ball in the second drawing. Suppose we are 
interested in the white color of ’the second ball. If th^ first ball drawn •isr' 
a white one, it may bear one of the numbers 1 to 3. Whatever tht^ 
number is, the second ball, if it is white, can bear only the two remaining. 
numbers. Therefore, under the assumptiojx-ttat the first ball’is a white * 
one, the number of favorable cases is^^^^f 6.^. Again, supposing that* 
the first ball drawn is black, we have'5 pc^iibilities as to its number, aiid, 
corresponding to any one of these possibilities; there are* 3L possibilities 
as to the number of the white ball to be takjn in the second drawing, 
so that the number of favorable cases now is 15} The number 

of all favorable cases is 6 + 16 = 21. The "required probability for 
the white ball is ^ In the same way, we should find 
that the probability for ‘the black ball is It is remarkable that 
these two probabilities ^re the same as if only a single balPhad been 
drawn. 

The situation is quite ditferent if we know the color of the first balL 
Suppose, for instance, thjt it is^white. The total number of equally 
likely cases will then be?3,.k 7 ’=“£2^’;,ahd the number of cases favorable 
to getting another white ball is 3X2 * 6, so that the probability in 
i this case is 
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This last example shows clearly much probability depends upon 
a gyen or knowr^ set of conditions.. 

Problem 7. Three boxes, identical in appearance, each have tw<\ 
drawers. The first box con tains a gold coin in each drawer; the second 
contains a silver coin in each draw.er; but the third contains a gold coin 
in one drawer and a silver coin in the other^-^ (a) A box is chosen at ran-^j 
dom. What is the probability that it contains coins of different metaisf^ 
(6) A box is chosen, one of its drawers opened, and a gold coin-lojind. 
What is the probability that the other drawer contains a silver coinf 
Soliition. (a) Since nothing outwardly distinguishes one box from 
the other, we may recognize three equally likely cases, and among them 
is only one case of a box with coins of different metals. • Therefore, we 
estimate the required probability as t 

(6) As to the second question, one is tempted to reason as follows: 
The fact that a ^ gold com was- found in one drawer leaves only two 
possibilities as to the content of the other drawer; namely, that the coin 
in it is either gqld or silver. Hence, the probability of a silver coin in 
the second drawer seems'to'be 3^. But this reasoning is fallacious. 
It is true that, when the gold coiiTis^found in xine drawer^^ there are only 
two possibilities left as to the content of the other drawer; but these 
possibilities cannot be considered as equally likely. To see this point 
clearly, let us distinguish the drawers of the first box by the numbers 1 
and 2; those of the second box by the numbers 3 and 4; finally, in the 
third box, 5 will distinguish the drawer containing the silver coin, while 
6 will represent the drawer with the gold coin. 

Instead of three equally likely cases: 

box 1, box 2, box 3 


we now have six cases: 


-S 4 


drawers 1,. 2; drawers 3, 4j^ drawers j5, 6, 

which, with reference to the fundamental assumptions, must be con¬ 
sidered as equally likely. If nothing were known about the contents 
of the drawer which has been opened, the^number of this drawer might be 
either 1, 2, 3, 4, p, or fi. But as soon as the gold coin is discovered in ij, 
cases 3, and 6 become impossible, and there remain three equally likely 
assumptions as to the number of tlie opened drawer: it may be either or 
2 or 6. • That leaves three cases, and in only toe of them, namely, in 
case 6,'will the other ^ a wb^ - co ntain a silver coin. Thus the answer 

to. the; second questiotTTh) is ^ ^- 

6. In thfe preceding problems the enumeration of cases did not 
present any difficulty. We are now going to discuss a few problems in 
which this ^enumeration is not so obvious but can be greatly simplified 
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by the use of well-known formulas for the number of permutniionsj 
arrangements, and combinations. 

. Let m distinct objects be represented by the letters a, c, . , . t. 
MTsing all these objects, we can place them in different orders^and form 
^'peimuiati 9 DS.” For instance, if there are only three letters, a, b, and c, 
^1 the possible permutations are: obc, ac5, bac, bda, cob, cba,~~6 different 
’^^i^lmutations out of 3 letters. In general, the number of ^rmutations 
Pm of m objects is expressed by 


= 1•2 • 3 


ms= m\ 


If n objects are taken out of the total number of m objects to form 
groups, attention being paid to^bt^ order of objects in each group, then 
^ihese groups are called ^‘arrangements.’' For instance, by taking two 
letters out of the four letters a, b, c, d, we can form the following 12 
arrangements: 

ab ha ca dd 
ae he ' cb dh 
ad hd cd de 

Denoting the symbol the number of arrangements of n% 
objects taken n at a time, the folloiidng formula holds: 

= wi(m — li(m — 2)' • • (m — n + 1). 

Again, if we form groups of n objects taken out of the total number of 
m objects, this time paying no attention to the order of objects in the 
group, we form “combinati^p^.” For instance, following are the 
different combinations out of 5 ot^^ts taken 3 at a time: 

abe abd ace 

.ode hed hce hib ede 

In general, the number of combinations out of m objects taken n 
at a time, which is usually denoted by the symbol C*, is given by 

_ m(m - l)(w - 2) • • • (m - n + 1) 

1-2-3 - n 


It is useful to recall that the same expression may also be exhibited 
as follows: ‘ ' , 

“ n!(»^“n)r . 

whence, by substituting m ~ n instead of n, tl e useful formula 

cs * esr* 


can be derived. 


COMPUTATION OP PROBABILITIES 


n 


7. After these preliminary remarks, we can turn to the problems in 
wh^^iihe foregoing formulas will often be used. \ 

Problem 8 . An urn contains a white balls and b black balls. 
i a^-V balls are drawn from this urn, find the probability that among 
them there will be exactly a white and /9 black balls. 

Solution. If we do not distinguish the order in which the balls c^nA; 
out of the urn, the total number of ways to get a + P balls out of the 
total number a + h balls is obviously expressed by CjJf and this is 
the number of all possible and equally likely cases in this problem. The 
number of ways to draw a white balls out of the total number a of white 
balls in the urn is Cj; and similarly Cf represents the number of ways 
of drawing P black balls out of the total number b of black balls. Now 
every group of a white balls combines with every possible group of P 
black balls to form the total of a white balls and P black balls, so that 
the number of ways to form all the groups containing a white balls and 
p black balls is C; • Cf. This is also the number of favorable cases; 
hence, the required probability is- _ ^ 


f ■ Cits’-] 


or, in a more explicit form. 


^ V 


( 1 ) p , 1-2 + g). ^ 

“ l* 2 ***a'l* 2*’*0 

a(a ~ 1 ) • • • (a - a -h 1 ) • h (6 - 1 ) • • • (h - -f 1 ) 

‘ (a -h b){a + 6 - 1 ) • • • (a + 6 - a - /3 + 1 ) 

V/^^^JProblem 9. An urn contains n tickets bearing numbers from 1 to n, 
and m tickets are drawn at a time. What is the probability that t of 
the tic kets re m oved hav e numbers previousIy^pecifiedT 

Solution. This problem does not essentially differ from the preceding 
one. In fact, i tickets with preassigned numbers can be likened to i 
white balls, while the remaining tickets correspond to the black balls. 
The required probability, therefore, can be obtained from the expression 
( 1 ) by taking a « t, 6 == n — f, a = f, = w — f and, all simplifications 
performed, will be. given by 


m{m 7 - 1 ) 
n(n — 1 ) 


(m ~ i + 1) 

(n - i + 1 ) ’ 


The conditions of this problem were realized in the French lottery, 
which was operated by the French royal government for a long time but 
discontinued soon after the Revolution of 1789. Similar lotteries 
continued to exist in other European countries throughout the nineteentfe^ • 
century.# Iq the French lottery, tickets bearing numbers from I to 90 
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were sold to the people, and at regular intervals drawings for winning 
numbers were held in different French cities. At each drawing, 5 
/numbers were drawn. If a holder of tickets won on a single number, 
he received 15 times i^ cost to him. If he won on two, three, four, or 
^ve tickets, he could claim respectively 270, 5,500, 75,000, and, finally, 
•il)R$90,000 times their cost to him. 

The numerical values of the probabilities corresponding to these 
different cases are worked out as follows: we must take n = 90, m = 5 , 
and i = 1, 2, 3, 4, or 5 in the expression (2). The results are 


Single ticket 


A = _L 

90 18* 


Two tickets 
Three tickets 
Four tickets 
Five tickets ^ 


5*4 
90-89 
5-4-3 
90 - 89 - 88 
5-4-3 -2 
90 - 89 - 88 • 87 
5-4-3-2-1 
- 89 - 88 - 87 - 86 


801* 

1 

11748* 

1 

511038* 

1 

43949268* 


' 8. Problem 10. From an urn containing a white balls and 6 black 

ones, a certain number of balls, k, is drawn, and they are laid aside, their 
color unnoted. Then one more ball is drawn; and it is required to find 
the probability that it is a white or a black ball. 

Solution. Suppose the k balls removed at first and the last ball 
drawn are laid on k + 1 different places, so that the last ball occupies 
the position at the extreme right. The number of ways to form groups 
of k + 1 balls out of the total number of a -f 6 balls, attention being 
paid to the order, is 

(a -f- 6)(a -f- 5 — 1) * • - {cl h — k). 

Such is the total number of pases in this problem, and they may all be 
considered as equally likely. To find the number of cases favorable to 
a white ball, we observe that the last place should be occupied by one of 
the o white balls. Whatever this white ball is, the preceding k balls 
form one of the possible arrangements out of o -f- 5 — 1 remaining balls 
taken k at a time. Hence, it is obvious that the number of oases favorable 
to a white ball is 

a{a + 5 ~ 1 ) - - (a + 5 — k), 
and therefore the required probability is given by 

a 

0 + 6 
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for a white ball. In a similar way we find the probability 6/(a + h) of 
drawing a black ball. These results show that the probability of getting 
white or black balls in this problem is the same as if no balls at all were 
removed at first. Here we have proof that the |^eculiar circumstances 
. observed in Prob. 6 are general. 

] Problem 11. Two dic e are thrQwn,n times imsucce«siQnr What i's 
I probability of obtaining double six at least once? 

' Solution7~ As there are 36 cases in every throw and each case of the 
first throw can combine with each case of the second throw, and so on, 
the total number of cases in n throws will be 36”. Instead of trying to 
find the number of favorable cases directly, it is easier to find the numbei 
of unfavorable cases; that is, the number of cases in which double sixes 
would be excluded. In one throw there are 35 such cases, and in n throws 
there will be 35”. Now, excluding these cases, we obtain 36” — 35” 
favorable cases; hence, the required probability is 

P = 1 - (!«)”. 

If one die were thrown n times in succession, the probability to obtain 
6 points at least once would be 

P = 1 ~ («)”. 

Now, suppose we want to find the number of throws sufficient to 
assure a probability > ^ of obtaining double six at least once. To this 
end we must solve the inequality 

(!«)” < i 

for n; whence we find 


log 36 — log 35 


= 24.6 


It means that in 25 throws there is more likelihood to obtain double 
six at least once than not to obtain it at all. On the other hand, in 
24 throws, we have less chance to succeed than to fail. 

Now, if we dealt with a single die, we should find that in 4 throws 
there are more chances to obtain 6 points at least once than there arc 
chances to fail. 

This problem is interesting in a historical respect, for it was the first 
problem on probability solved by Pascal, who, together with his great 
contemporary Fermat, had laid the first foundations of the theory of 
probability. This problem was suggested to Pascal by a certain French 
nobleman, Chevalier de M6re, a man of great experience in gambling. 
He had observed the advantage of betting for double six in 25 throws 
and for one six (with a single c^e) in 4 throws. He found it difficult to 
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understand because, he said, there were 36 cases for two dice and 6 cases 
for one die in each throw, and yet it is not true that 25:4 = 36:6. Of 
“course, there is no reason for such an arbitrary conclusion, and the cor¬ 
rect solution as given by Pascal not only removed any apparent paradoxes 
in this case, but it led to the same number, 25, observed by gamblers in 
their daily experience. 

Problem 12. A certain number n of identical balls is distributed 
among N compartments. What is the probability that a certain speci¬ 
fied compartment will contain h balls? 

Solution. To find the number of all possible cases in this problem, 
suppose that we distinguish the balls by numbering them from 1 to n. 
The ball with the number 1 may fall into any of the N compartments, 
which gives N cases. The ball with the number 2 may also fall into any 
one of the N compartments; so that the number of cases for 2 balls will 
he N • N = N\ Likewise, for 3 balls the number of cases will be 

N^‘N 

and for any number n of balls the number of cases will be N^. To find 
the number of favorable cases, first suppose that a group of h specified 
balls falls into a designated compartment. The remaining n — h b alls may 
be distributed in any way among — 1 remaining compartments. But 
the number of ways to distribute n — h balls among JV — 1 compart¬ 
ments is {N — I)**"* and this becomes the number of all favorable cases 
in which a specified group of h balls occupies the designated compartment. 
Now, it is possible to form CJ such groups; therefore, the total number of 
favorable cases is given by 

Ci • (W - i)-\ 


and the required probability will be 


Pa = 


Ci • (N - 1)»-* 

Nn 


In case n, N and h are large numbers, the direct application of this 
formula becomes difficult, and it is advisable to seek an approximate 
expression for p*. To this end we write the preceding expression thus: 


where 


Pa 




1-2-3 
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Now, supposing 1 ^ A; ^ ^ — 1, we have 

On the other hand, 

k{h 

and so 

The inequalities (a) and (h) give simple lower and upper limits for P, 
For we can write P® thus: 


p2 = 





and then apply \a) or (6), which leads to these inequalities 


P < 



Correspondingly, we have 



p* < 




1’23 




Ph 


>_ Q _(i 

1 • 2 • 3 • • /iV 




Problem 13^ What is the probability of obtaining a given sum s of 
points with n dice? 

Solution. The number of all cases for n dice is evidently 6". The 
number of favorable cases is the same as the total number of solutions of 
the equation 


Cl) ai + «2 "b * ' * 4* ttn = s 

where ofi, o£ 2 , • * * a« are integers from 1 to 6. This number can be 
determined by means of the following device: Multiplying the polynomial 

(2) X + -h 


by itself, the product will consist of terms 





24 


INTRODUCTION TO MATHEMATICAL PROBABiUtY . (Cha». 1 

where a\ and at independently assume all integral values from 1 to 6 . 
Collecting terms with the same exponent s, the coefficient of x* will give 
the number of solutions of the equation 


ai + at - 8, 

ai, at being subject to the above mentioned limitations. 

Similarly, multiplying the same polynomial ( 2 ) three times in itself 
and collecting terms with the same exponent 8, the coefficient of x" will 
give the number of solutions of equation ( 1 ) for n = 3. In general, the 
number of solutions of equation (1) for any n is the coefficient of x* in 
the expanded polynomial 

(x + X* -f ic* + + re® + x^)\ 

Now we have identically 

X + X^ + X^ + X* + X^ + X^ = ^ - —f 

1 — X 

and by the binomial theorem 

*-(l - *•)- -x< 

1-0 

(1 - x)— = 

fc-0 

Multiplying these series we find the following expression as the 
coefficient of x*: 

j —n 

XC-D'C'CtiV. 

1-0 

where summation extends over integers not exceeding —g— The same 

sum represents the number of favorable cases. Dividing it by 6 **, we 
get the following expression for the probability of 8 points on n dice: 

«-o 

The preceding problems suffice to illustrate how probability can be 
determined by direct enumeration of cases. For the benefit of studer , 
a few simple problems without elaborate solutions are added here. 



CHAPTEI^II 

THEOREMS OF TOTAL AND COMPOUND PROBABILITY 


1. As the problems become more complex the difficulties in enumerat¬ 
ing cases grow and often the computation of probabilities by direct 
application of definition becomes very involved. In many cases the 
complications can be avoided by use of two theorems which are funda¬ 
mental in the theory of probability. 

Before we can give a clear and exact statement of the first fundamental 
theorem, we must defin^what is meant by “mutually exclusive” or 
“incompatible” events. Invents are called mutually exclusive or 
incompatible if the occurrence of one of them precludes the occurrence 
of all the others^ For instance, the four events concerning the number 
of points on two dice 

First Die Second Die 

1 4 

2 3 

3 2 

4 1 

are mutually exclusive because it is evident that as soon as one of them 
occurs, none of the others can materialize. 

On the contrary, events are compatible if it is possible for them to 
materialize simultaneously. For instance, the events of 5 points on one 
die and 5 points on the other, are compatible, since in tossing two dice 
it is possible to get 5 points on each. 

To denote the probability of an event A, we shall use the symbol (A). 
To denote the probability of A or B (or both) we shall use the symbol 
(A B). Dealing with several events A, B, . . . L, the symbol ^ 

{A+B + ■ ■ ■ +L) 

will denote the probability of the occurrence of at least one of them. 
If A, B, . . . L are mutually exclusive events, this symbol represents 
the probability of the occurrence of one of them without specification as 
to which one. 

2. Now we shall state the first fundamental theorem, called the 

“theorem of total probability” or “theorem of addition of probabilities,” 
v: the following way: * 

a' Theorem of Total Probability. The probability for one of the mtUtiaUy 
exclusive events Ai^ . . . Antomaisrializef is the sum of the probabilities 

27 
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aj these events. In symbolical notations, it is expressed thus: 

(Ai + -da + * • • + d-n) = (di) + (da) + • * • + (dn). 

Proof. Let N be the number of all possible and equally likely cases 
out of which mi cases are favorable to the event di, m% cases are favorable 
to the event da, . . . , and finally, m» cases are favorable to the event d». 
These cases are all different, since events di, da, . . . dn are incompati¬ 
ble. The number of cases favorable to either di or da, ... or dn is 
therefore 


mi + wia + 


+ mn. 


. Hence, by definition 


(di + da + • * * + dn) = 


mi -h ma + 


N 


+ mn _ mi ma , 

^ N N 


N’ 


Again, by definition of probability, 

ma 


- / A 

“ (di), 


N 


= (da); 


ah 

N 


(dn). 


and so finally 

(di + d, + • • • + dn) = (di) + (da) + • • • + (dn), 

as stated. 

3. It is important to know that the same theorem, stated in a slightly 
different form, is especially useful in applications. An event d can 
occur in several mutually exclusive forms, di, da, . . . dn, which may 
be considered as that many mutually exclusive events. Whenever d 
occurs, one of these events must occur, and conversely. Consequently, 
the probability of d is the same as the probability of one (unspecified) 
of its mutually exclusive forms. If, for instance, occurrence of 5 points 
on two dice is d, then this event occurs in 4 mutually exclusive forms, as 
tabulated above. 

From the new point of view, the theorem of total probability can be 
stated thus: 

y. Second Form of Theorem of Total Probability. The probability of 
an event A is the sum of the probabilities of its mutually exclusive forms 
Aif Atf . . . dn/ or, using symbols^ 

(d) « (di) + (da) + • • • + (dn). ' 

Probabilities (di), (da), . . . (dn) are partial probabilities of incom¬ 
patible forms of d. Since the probability d is their sum, it may be called 
a total probability of d. Hence the name of the theorem. 
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In the preceding e^mple we saw that 5 points on two dice could be 
obtained in 4 mutually exclusive ways. Now the probability of any one 
of these ways is hence, by the preceding theorem, the probability 
of obtaining 5 points with two dice is /L * ^ 


■jAr+TAr + )rSr + iV-“A“-i> i 





as it should be. 

If events Ai, ila, . . . An are not only mutually exclusive, but 
exhaustive,'* which means that one of them must necessarily take place, 
the probability that one of them will happen is a certainty = 1, so that 
we must have 

(Ai) (A 2 ) + * * • -4“ (A„) = 1. 

An event which is not certain, may or may not happen; this constitutes 
two mutually exclusive cases. It is customary to call nonoccurrence of a 
certain event A as the event opposite" to A, and we shall denote it 
by the symbol A. Now A and A constitute two exhaustive and mutually 
exclusive cases. Hence, by the preceding remark 

(A) + (i) = 1. 

That is, if p is the probability of A 

9 = 1 - P 

♦ 

represents the probability that A will not occur. 

If an event A is considered in connection witn another event i5, 
the compound event AB consists in simultaneous occurrence oLA and B. 
For three events A, 5, C, the compound event ABC consists in simul¬ 
taneous occurrence of A and B and C, and so on for any number of 
component events. We shall denote the probability of a compound 
event A . . . L by the symbol 

{AB . . . L). 

An event A can materialize in two mutually exclusive forms, namely, 
as A and B or A and B. Hence, by the theorem of total probability 

(A) = (A^) + (AB).x/' 

Similarly ^ 

(B) = (BA) + (BA), 

or, since the symbol (BA) does not depend upon the order of letters, 

(B) = (AB) + (iB). 

The sum (A) + (B) can be expressed as 

(A) + (B) = (AB) + [(AS) + (AB) + (AB)). 
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Again, by the theorem of total probabilities, the sum 
{AB) + (IB) + {AB) ^ 


represents the probability (A + B) of the occurrence of at least one of 
the events A or B, The preceding equation leads to the useful formula 

(1) (A+B) = (A) + (B) - (AB) ! ' 

if which obviously is a generalization of the theorem of total probability; 
for (AB) = 0 if A and B are incompatible. Equation (1) can be used to 
derive an important inequality. Since (A + B) ^ 1, it follows from (1) 
that r?. 

(AB) ^ (A) + (B) - 1. 

% 

If B itself is a compound event AiAj, this inequality leads to 


But 
and so 


(AAiA,) ^ (A) + (AxA*) - 1. 

(A,A,) ^ (Ai) + (Ai) ~ 1, 
(AAxAO ^ (A) + (Ax) + (Aa) ~ 2 


for three component events. Proceeding in the same manner, we can 
establish the following general inequality: 4' 


(AAxAj * • • An-i) ^ (A) 4* (Ax) + (A 2 ) + * * • + (An-i) — (n — 1). 

Applying this inequality to events A, .^x, . . . A!^x respectively 
opposite to A, Ax, . . . An-i, we get 

(AAi • • • An^i) ^ (A) + (Ax) + • * • + (A»-.x) — (n ~ 1), 

or, since (A<) = 1 — (AO, 

(A) + (Ax) + • • ' 4" (An-i) ^ 1 — (AAx • * • An-l). 

Now the compound event A Ax . . . An-i means that neither A nor 
Ai, . . . nor An^x occurs. The event opposite to this is that at least 
one of the events A, Ai, . . . A»-i occurs. Hence, 


+ A 


1 - (AAx • • • An-l) = (A + Ax + • • 
and we reach the following important inequality: 

(A 4- Ax 4* • * * 4“ An-i) ^ (A) 4- (Ax) 4- 

Equation (1) can be extended to the case of more than two events. 
Let B mean the occurrence of at least one of the events Ax or A 2 . Then 
by (1) 




(A+Ai + At) = (A) + (A, + At) - (AB). 
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As to (Ai + A 2 ), its expression is given by (1). The compound event 
AB means the occurrence of one at least of the events AAi or A At. 
Hence, applying equation (1) once more, we find 

(AB) = (AAi + AAt) = (AAi) + {AAt) - (AAiAt) 

and after due substitutions 


(A + Ai + A2) = (A) + (A,) + (At) - (AAi) - (AA2) - (AiA,) + 

+ (AA1A2). 

Proceeding in the same way and using mathematical induction, the 
following general formula can be established: 


(^ + /l, + • • • + A^i) = + XiAtAiA,) - • • . 

tj 


where summations refer to all combinations of subscripts taken from 
numbers 0, 1, 2, . . . n — 1, one, two, three, . . . , and n at a time. 

6. Let A and B be two events whose probabilities are (A) and (B). 
It is understood that the probability (A)‘is determined without any 
regard to B when nothing is known about the occurrence or nonoccur¬ 
rence of B. When it is known that B occurred, A may have a different 
probability, which we shall denote by the symbol (A, B) and call “con¬ 
ditional probability of A, given that B has actually happened.^' 

Now we can state the second fundamental theorem, called the 
“theorem of compound probability^^ or “theorem of multiplication of 
probabilities,^’ as follows: 

Theorem of Compound Probability. The probability of bimuUaneous 
occurrence of A and B is given by the product of the unconditional probability 
of the event A by the conditional probability of B, supposing that A actually 
occurred. In other words. 


{AB) = {A) • (B^A). 

Proof. Let N denote the total number of equally likely cases among 
which m cases are favorable td the event A. The cases favorable to A 
and B are to be found among the m cases favorable to A. Let their 
number be mi. Then, by the definition of probability,, 


(AB) 

which also can be written thus: 


mi' 


(^5) = r 


mi 

m 


Now the ratio m/N represents the probability of A. To find the meaning * 
of the second factor, we observe that, assuming the occurrence of A, 
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there are only m equally likely cases left (the remaining N -- m cases 
becoming impossible) out of which 4»i-are favorable to B. Hence the 
ratio mi/m represents the conditional probability (B, A) oi B supposing 
that A has actually happened. 

Now since 


? = (A), ^ = (B, A), ^ 

the probability of the compound event AB is expressed by the product 
(AB) = (A)*(B, A). 

Since the compound event AB involves A and B symmetrically, 
we shall have also 


(AB) = (B) • (A, B). 

The theorem of compound probability can easily be extended to several 
events. For example, let us consider three events. A, B, C. The occur¬ 
rence of A and B and C is evidently equivalent to the occurrence of the 
compound event AB and C. We have, therefore, 

(ABC) = (AB) • (C, AB) 

by the theorem of compound probability. By the same theorem 
(AB) = (A) • (B, A), 

so that 

(ABC) = (A) • (B, A) • (C, AB). 

Obviously this formula can be extended to compound events con¬ 
sisting of more than three components. 

In one particular but very important case, the expression for the 
compound probability can be simplified; namely, in the case of so-called 
‘‘independent events.” Several events are “independent” bv definitio n 
if the probability of any one of them is not affected by supplementa ry 
k nowledge concerning the m aterializ ation of any number.of the remain ing 
events^ For instance, if A a nd B "represent white balls drawn from 
tjvo diffe rent urns^ the probability of A is the same whether the color 
ofthe ball drawn from the other urn is known or not. Similarly, granted 
that a coin is unbiased, heads at the first throw a h? heads %t^j;he second 
||throw are ind ependent events. In such theoreti^l cases the ind^' 
^ndence of events can be reasonably assimed or agreed upon. In other 
cases, and especially in practicaF applications, it is not easy to decide 
whether events should be considered as independent or not. 

If A and B are independent, the co nditional probability (B, A) is 
the same as the probability (B) found without any reference to A; this^ 
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follows from the definition of independence . Hence, the expression of 
compound probability {AB) for two independent events becoines 

so that t he probability of a compound event with independent coni’ 
D onents is simply, equal to the product of the probabilities of componen t 
evei^ T his ruk extends ^ any number of componen t eve rts if they 
are inde penden t.TLet us confer three?in(fependenVevents^ and C. 

The independence of these events implies 

(B, A) = (^; {C, AB) = (C) 

and hence 

{ABC) = (A) • {B) • (C) 
in accordance with the rule. 

To illustrate the theorem of compound probability, let us consider 
two simple examples. An urn contains 2 white b ft11« hlarV nni>g 

T Vo balls are dra wn, and it is required to find the probability that they 
are both whit e. Let A be the event consisting in the white color of the 
first ball, and B the event consisting in the white color of the second ball. 
The probability (A) of extracting a white ball in the first place is 

(A) = h 

^To^nd the conditional probability (B, A) we observe, after drawing one 
white ball, that 1 white and 3 black balls remain in the urn. The 
probability of drawing a white ball under such circumstances i$ 

(B, A) = 

Now, by the theorem of compound probability, we shall have 

{AB) = § • i = -A. 

Evidently, in this example we dealt with dependent events. 

As an example of independent events, let a coin be tossed any given 
number of times; say, n times. What is the probability of having only 
heads? The compound 6vent in this example consists of n independent 
components; namely, heads at every trial. Now the probability of 
heads in any ^trial is 3^, and so the required probability will be 1/2*. 
Note: Two events A and B arc independent by definition, if 

(A^B) = (A) and (jp^A) = W. ' .... 

However, one of these conditions follows from the other. Supp)osc the condition 

(A, B) = (A) 

is fulfilled, so that A is independent of B: We have then 

(AB) « (B) • (A). 
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On the other hand, 

{AB) - (A). (B, A), 

whence 

(B. A) - (B), 

80 that B is independent of A. 

Three events A, B, C are independent if the following four conditions are fulfilled: 
(A. B) « (A); (A, C) - (A); (B, C) « (B); (C, AB) - (C). 

From the first three conditions it follows that 

(B, A) * (B); (C, A) - (C); (C, B) - (C). 

To show that the other requirements 

(B, AC) « (B); (A, BC) « (A) 

are also fulfilled, we notice that 


(ABC) - (A) . (B, A) . (C, AB) « (A) • (B) • (C) 

because (C, AB) ■■ (C) by hypothesis and (B, A) = (B) as proved. On the other 
hand, 

(ABC) - (A) . (C, A) . (B, AC) 

and (C, A) » (C). Hence, comparing with the preceding expression, 

(B, AC) « (B). 

Similarly, it can be shown that 

(A, BC) « (A). 


The independence of four events A, B, C, D is assured if the following 11 conditions 
are fulfilled: 


(A, B) - (A, C) - (A, D) - (A); (B, C) - (B, D) * (B); (C, D) - (C); 

(C, AB) - (C); (D, AB) - (D, AC) - (D, BC) =» (D); (D, ABC) * (D). 

And in general, independence of n events is assured if 2** — n — 1 conditions of 
similar type are fulfilled. 

If several events are independent,^ every two of them arc indenendim t: but this 
does not suftice for xhe Ind ependence of all events , as can be shown by a simple ^xai n- 
An urji contains four tickets with numbers 11&, 121, 211, 222, and one ticket is 
drawn. What are the probabilities that the first, second, or third digits in its number 
are 17 Let a unit such as the first, second, or third digit; be represented, respectively 
by A, B, or C. Then ^ ^,4 

(A) - (fl) - (C) - I - J. 


Compound probabilities (AB), (AC), (BC) are 

(AB) - (AC) - (BC) - i. 

since among four tickets there is only one whoso number has first and second, or 
first and third, or second and third digits of 1. Now, for instance. 


(AB)-i-i. i-(A).(B), 
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wheuce A and B arc independent. Similarly, A and C; C and B are independent. 
Thus, any two of the events A, B, C are independent^ but not all three events are. 
For, if they were, we should have 

(ABC) * i. 

But (ABC) » 0 since in no ticket are all three digits equal to 1. 

7. The theorems of total and compound probability form the founda¬ 
tion of the theory of probability as it represents a separate branch of 
mathematical science. They serve the purpose of finding probabilities 
in more complicated cases, either by being directly applied or by enabling 
us to form equations fro m which the required probabil ities can b e foq nd. 
A~few selected problems ,will illust rate J^he various ways of using these 
theorems. : 

ProMem 14. An urn contains a white balls and h black balls; another 
containT wlule and d black ballsr One ball is transferred from the first 
urn intone second,’'and then a ball is drawn from the latter. What is 
the probability that it will be a white ball? 

Solution. The event consisting in the white color of the ball drawn 
from the second urn, can materialize under two mutually exclusive forms: 
when the transferred ball is a white one, and when it is black. By the 
theorem of total probability, we must find the probabilities corresponding 
to these two forms. To find the probability of the first form, we observe 
that it represents a compound event consisting in the white color of the 
traasferred ball, combined with the white color of the extracted ball. 
The probability th^t th^transferred ball is white is given by the fraction 

2 i -E- ^ 

\ \ \ ^ a 4- h V l-'> 

aW the probability that the ball removed from the second urn is white, is 

c 4- 1 

c 4” d 4-1 v: ‘ I 

because before the drawing there were c 4- 1 white balls and d black 
balls in the second urn. Hence, by the theorem of compound probability, 
the probability of the first form is h f 

_ q( c -j- 1 ) _ 

(a 4“ 6)(c 4" d 4" 1) 

In the same way, we find that the probability of the second form is 

b/C 

(o 4- b)(c 4* d 4" 1) 
and the sum of these two numbers 

• ac 4“ be -h a 

(a + b)(c + d 4- 1) 
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gives the probability of extracting a white ball from the second urn, after 
one ball of unknown color has been transferred from the first urn. 

8. Problem 16. Two players agree to play under the following 
conditions: Taking turns, they draw the balls out of an urn containing 
o white balls and h black balls, one ball at a time^ He who extracts the 
first white one wins the game.* What is the probability that the player 
who starts will win the game? 

Solution. Lei A be the player who draws the first ball, and let B 
be the other player. The game can be won by A, first, if he extracts a 
white ball at the start; second, if A and B a lternately extract 2 black 
balls and then A draws a white one;^ir(l^lf A and B alternately extract 
4 black balls and the fifth ball drawn by A is white; and so on. By the 
theorem of total probability, the probability for A to win the game, 
is the sum of the probabilities of the mutually exclusive ways (described 
above) in which he can win the game. The probability of extracting a 
white ball at first is 

t 

a _ 

a + b 


The probability of extracting 2 black balls and then 1 white ball is found 
by direct application of the theorem of compound probabilities. Its 
expression is i L 

_ 6(6 - l)a _ ^ 

(ci -f- 6)((x h — 4" ^ — 2) 




The probability of extracting 4 black balls and then 1 white ball is given 

% 

_ 6(6 - 1)(6 - 2)(6 - 3 )a_, 

(a -1- h){a + 6 - l)(a + 5 - 2)(a + h- 3)(a + b - i) < 

using the same theorem of compound probability. 

In the same way we deal with all the possible and mutually exclusive 
ways which would allow A to win the game. Then, by adding the above 
given expressions of partial probabilities, we obtain the expression for the 
required probability in the form of the sum 


P = _i_ri +_ ^rJ) _+ 

o + 6[^(a + 6- l)(o + 6 - 2) ^ 

. _ 6(6 - 1)(6 - 2)(6 - 3) _ 4 . ... 1 

(a + 6 - l)(o + 6 - 2)(o + 6 - 3)(a + 6 - 4) J’ 

The law of formation of different terms in this sum is obvious f an^ 
the sum automatically ends ba soon as we arrive at a term which is equaj 
to zero. 
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In the same way, we can find that the probability for the player B 
to win is expressed by an analogous sum: 

g r h Hb - I)(b ~ 2) 

^ a + 6Lg + - 1 (g + ^ ~ l)(g + & - 2 )(a 4 - V- 3) 

But one of the players, A or J5, must win the game, and the winning of 
the game by A and B are opposite events. Hence, 

P + Q = 1 

or, after substituting the above expressions for P and Q and after obvious 
simplifications, 

1 . h _5(6 ^1) _ a + 6 

^‘^a-f6-l'^(a + 6- l)(a -f 6 - 2) a ' 

This is a noteworthy identity, obtained, as we see, by the principles 
of the theory of probability. Of course, it can be proved in a direct 
way, and it would be a good problem for students to attempt a direct 
proof. There are many cases in which, by means of considerations 
belonging to the theory of probability, several identities or inequalities 
can be established whose direct proof sometimes involves considerable 
difficulty. 

9. ProblemCj.6« Each of k urns contains n identical balls numbered 
from 1 to n. One ball is drawn from every urn. What is the probability 
that m is the greatest number drawn? 

Solution. Let us denote by Pm the required probability. It is not 
apparent how we can find the explicit expression for this probability, but 
using the theorems of total and compound probability, we can form 
equations which yield the desired expression for Pm without any difficulty. 
To this end, let us first find the probability P that the greatest number 
drawn does not exceed m. It is obvious that this may happen in m 
mutually exclusive ways; namely, when the greatest number drawn is 
1, 2, 3, and so on up to m. The probabilities of these different hypotheses 
being Pi, P 2 , . . . P»», their sum gives the following first expression for 
P: 

(1) P = Pi 4- P 2 -h • • • + Pm. 

We can find the second expression for P using the theorem of com¬ 
pound probability; namely, the greatest number drawn does not exceed 
m if balls drawn from all urns have numbers from 1 to m. The proba¬ 
bility of drawing a ball with the number 1, 2, 3, ... m from any urn is 
m/n. And the probability that this will happen for every urn is a 
compound event consisting of k indepo?'dent events with the same 
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probability m/n. Therefore, by the theorem of compound probability 


And this compared with (1) gives the equation 


Px+Pt + 


n* 


Substituting m — 1 for w in this equation, we get 


Pi + Pi * • • + Pm-l = 


(m - 1)* 


and it suffices to subtract this from (2) to have the required expression for 

Pw,: 

P — — (m — 1)* 


10. Problem ("it. Two persons, A and B, have respectively n -f 1 
and n coins, which they toss simultaneously. What is the probability 
that A will have more heads than B? 

Solution. Let /x, /n' and v, be numbers of heads and tails thrown 
by A and B, respectively, so that n + v — n + The 

required probability P is the probability of the inequality n > The 
probability 1 — P of the opposite event /x g is at the same time 
the probability of the inequality v > v'] that is, 1 — P is the probability 
that A will throw more tails than B. By reason of symmetry 1 — P = P, 

11. Problem Three players A, B, and C agree to play a series of 
games observing the following rules: two players participate in each game, 
while the third is idle, and the game is to be won by one of them. The 
loser in each game quits, and his place in the next game is taken by the 
.player who was idle. The player who succeeds in winning over both 
of his opponents without interruption, wins the whole series of games. 
Supposing that the f)robability for each player to win a single game is 
^ and that the first game is played by A and B, find the probability for 
Af B, and C,-respectively, to win the whole series, if (a) the number of 
games to be played is limited and may not exceed a given number n; 
if (6) the number of games is unlimited. 

Elution. Let Pn, Qn, Rn be the probabilities for A, B, and C, respec¬ 
tively, to win a series of games when their number cannot exceed n. By 
reason of symmetry, Pn = Qn so that it remains to find Pn and Bn. 
The player A can win the whole series of games in two mutually exclusive 
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ways: if he wins the first game, or if he loses the first game. Let the 
probability of the first case be pn and that of the second r„. Then 

Pn = Pn + r*n. 


A can win the whole series after winning the first game, in two mutually 
exclusive ways: (a) if he wins over B and C in succession; (6) if he wins 
the first game from B and loses the second game to C ; then, if in the third 
game C loses to B, and in the fourth game A wins over B and later wins 
the whole series of not more than n — 3 games. Now, the probability 
of case (o) is • 3^^ = K hy the theorem of compound probability; 
that of case (b) by the same theorem is and the total probability is 


( 1 ) 


Pn = i + iPn-3.- 


If A loses the first game to B, but wins the whole series, then In thh 
second game C wins over B while the third game is won by A^ and not 
more than n 2 games are left to play. Hence, j 

( 2 ) Tn = iPn- 2 . ^ 

Since evidently p 2 = ps = P 4 == equation (1) by successiv^ 
substitutions yields 


pzk 


Pzk+i 


' + i + f> + 

■+I + P + 


1/ 

pu+1 - 

or, in condensed form for an arbitrary n 

p„ = ?(1 - 

denoting by [x] the greatest integer contained in x. 
(2) the general expression of r„ will be 

r„ = ,.,(1 


_L^ 

8*-7 

J_\ 

8*-V 

h) 


Hence, by virtue of 


and that of Pn, Qn 


= Q. = - A8 L s J 


'48 


,>48 




Finally, to find the probability for C to win, we observe that this caJi 
happen only if C wins the second game; hence, 


Rn = pn-i = A - A8 


[i] 
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Since Pn + Qn + Rn < 1, the difference 
1 - P« - Q» - B. = 

represents the probability of a tie in n games. This probability decreases 
rapidly when n increases, so that in a long series of games a tie is prac¬ 
tically impossible. If the number of games is not limited, the proba¬ 
bilities Py Qj R for Aj Bj C, respectively, to win are obtained as limits of 
Pm, Qnt Rn, when n increases indefinitely. Thus 

P = Q = R = 


Problems for Solution 

1. Three urns contain respectively ^hite and 2 black balls; 3 white and 1 black 

ball; 2 white and 3 black balls. One is taken from each urn. What is the proba¬ 
bility that among the balls drawikihCre are 2 white and 1 black? Ana. ^%o- 

2. Cards are drawn one by one from a full deck. What is the probability that 

1. 10 cards will precede the first ace? ^ Ans. * 0.03938. 

^ r^Urn 1 contains 10 white and 3 black balls; um 2 contains 3 white and 5J}lack 
bite. Two balls are transferred from No. 1 and placed in Ncw^2 and then one bajl is 
taken from the latter. What is the probability that it is a Whitehall? Ana. ®^3o- V 
4 . Two urns identical in appearance contain respectively 3 white and 2 black balls; 
2 white and 5 black balls. is selected and a ball taken from it. What is the 

probsbihty that this ball is white? ' Ana. 

(6j^,,)What is the probability that 5 tickets drawn in the French lottery all have one- 
dig^numbers? Ana. J^ 44 i 626 ~ 29.10■■^ 

4^4 What is the probability that each of the four players in a bridge game will get a 

(1 • 2 • • • 131* 

complete suit of cards? Ans. 24^-— -— « 4.474.10'*®. 

1'2 ' • *62 

7. What is the probability that at least one of the players in a bridge game will 
get a complete suit of cards? 

16 • 13! • 39! - 72 • (13»)* • 26! + 72 • (13!)® 

62! 

Secj^pc* 5, page 31. 

From an urn with o white and b black balls n balls are taken. Find the prob¬ 
ability of drawing at least one white ball. Ana. The required probability can be 
expressed in two ways. First expression: 


Ana. 


2.52 • 10”“. 


_ 6(6 - 1) • — (6 - n + 1) 

(a + 6)(o + 6 - 1) • • * (a -f 6 - n + 1)* 

Second expression: 


o r 6 _6(6 - 1) . » • (6 ~ n -f 2) _ 

a+6L '^a-h6~l’^‘'’ “^(0-^6 - l)(a + 6 - 2) • • • (a + 6 - n + 1) 
Equating them, we have an identity 


1 + 


6 


_ 6(6 - 1) > ■ ♦ (6 - n 4 - 2 ) _ 

‘ (o -I- 6 - l)(a + 6-2)---(a-f6-n + l)“ 

^ a-H6 r_ 6(6 -- 1) • - (6 - n + 1) 

a L (® "b ^)(® '+"6 — 1) • • • (a -+■ 6 — n 1) 


a -1-6 - 1 
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0 . Three players A^B^C in turn draw bolls from an urn with 10 white and 10 black 
balls, taking one ball at a time. He who extracts the first white ball wins the game. 
Supposing that they start in the order A, B, C, find the probabilities for ciud\ of them 
to win the game. Ana. For A, 0.56584; for B, 0.29144; for C, 0.14271. 

10 . If n dice are thrown at a time, what is the probability of having eatrh of the 
points 1, 2, ... 6, appear at least once? Find the numerical value of this prob¬ 
ability for n * 10. An«. 

If p. - 1 - 6(1)- + 15(t)- - 20(1)* + 15(1)* - 6 • (J)» 

1/ p,. > 0.2718. 

Hint: Use the formula in Sec. 5, page 31. 

11. In a lottery m tickets are drawn at a time out of the total number of n tickets, 
and returned before the next drawing is made. What is the probability that in k 
drawings each of the numbers 1, 2, ... n will appear at least once? An«. 

, n/n — mV , n(n — 1)/n — mV/n - m — l\‘ 

\-;rrr) - • 

12. We have k varieties of objects, each variety consisting of the same number of 
objects. These objects are drawn one at a time and replaced before the next drawing. 
Find the probability that n and no less drawings will be required to produce objects of 
all varieties. A ns. 

^ fc--‘p, - (fc - l)-i - - 2)-* + _ 3),-i-_ 


13. Three urns contain respectively 1 white, 2 black balls; 2 white, 1 black balls; 

2 white, 2 black balls. One ball is transferred from the first urn into the second; then 
one from the latter is transferred into the third; finally, one ball is drawn fr6m the 
third um. What is the prob:.bility of its being while? Ans. 

14. Each of n urns contains a white and b black balls. One ball is transferred 
from the first urn into the second, then one ball from the latter into the third, and so 
on. Finally, one ball is taken from the last urn. What is the probability of its being 
white? Ans. Denote by p* the probability of drawing a white ball from the fcth um. 
Then 

/ o + 1 _ X 

/ pfc 4- — r ■■■,(1 - P*) 


a + 6 + r 


0+6 + 1 


1, 2, ... n — 1. Hence, 


a 

16. Two players A and B toss two dice, A starting the game. The game is won 
by A if he casts 6 points before B casts 7 points; and it is won by B if he casts 7 points 
before A casts 6 points. What are the probabilities for A and B to win the game if 
they agree to cast dice not more than n times? What is the probability of a tie? 
Ana. Probability for A: 

Pn - Hll - (ifj)"*] if n « 2m 

Pn » m - (»«)"•■"*] if n - 2m + 1. 

Probability for B: 

Qn « ||[1 - (IH)"*] n 

if /i=*2m + l. 
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Probability of a tie: 

r« * (iff)"* if n * 2m; r» * if n - 2m + 1. 

If n increases indefinitely, r» converges to 0 and p*, g» converge to the limits 
P - n, 9 = U, 

which may be considered as the probabilities for A and B to win if the number of 
throws is unlimited. 

16. The game known as craps** is played with two dice, and the caster wins 
unconditionally if he produces 7 or 11 points (which arc called “naturals”); he loses 
the game in case of 2, 3, or 12 points (called “craps”). But if he produces 4, 5, 6, 8, 9, 
or 10 points, he has the right to cast the dice steadily until he throws the same num¬ 
ber of points he had before or until he throws a 7. If he rolls 7 before obtaining his 
point, he loses the game; otherwise, he wins. What is the probability to win? 

Ans. - 0.493. 

17. Prove directly the identity in Prob. 15, page 37. 

Solution 1. Let 


5 p(c, b) 


h b(b- 1) 
c e(c — 1) 


where b is a positive integer and c > b. 


6(6 - l)(b - 2) 
c(c - l)(c - 2) 

Then 


+ . . . 


whence 


^(c, 6) « -(1 4- <p(c ~ 1, 6 - 1)1 
c 


v>(c, 1) == ^(c, 2) = ^(c, 3) - —~ 

C C — 1 c — / 


and in general 


^(c, 6) 


c - 6 -f 1 


Taking c = a + b — 1, we have 


1 + v>(o + b - 1, b) - 


o +b 


Solution 2. The polynomial 


6 6(b - 1) 

Six) = 1 + -X -h -+ 

c c(c — 1) 


can be presented in the form of a definite integral 
S(*) - (C + l)J^*(l - 

whence 

c 4 1 o 4” b 

^(1) » (c 4-1) (1 - = -- yT - « 

Jo c - 6 -h 1 a 

if c = o 4 b — 1. 

18. Find the approximate expressions for the probabilities P and Q in Prob. 15, 
page 36, when b is a large number. Take for numerical application a b ^ 50. 
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Solution. Since P + Q » 1» it suffices to seek the approximate expression for 
P ' Q. Now 


whence 


P - 0 - ajj‘(l - 2{)‘(1 - t)-yt 

)re88ion of this integral, u 


To find the approximate expression of this integral, we set 

(1 


whence u can be expressed as a power series in v: 


46 + a - 1 . 126* + (26 + o - 1)» . 

-ff _ _—;-_!,* -I---;---1^1 _ 


26 + a - 1 (26 -h a ~ D* 


3(26 + a - 1)» 


Substituting the resulting expression of du/dv and integrating with respect to p 
between limits 0 and », we obtain for P — Q an asymptotic expansion whose first 
terms are 


P Q_ ^ \i 46 4- g - 1 1 a[126* -f (26 -f o - 1)«1 (-1)^ 

^ " 26 + a - iL (26 + o - 1)*J (26 4- a - 1)» 2*C:^' 

A more detailed discussion reveals that the error of this approximate formula is less 

than and greater than T provided 

(26 4* 0 I}" 

b ^ 12. For a « 6 » 50 the formula yields 

P - 0 * 0.3318; P » 0.6659; Q = 0.3341. 

References 

Jacob Bernoulli: “Are conjectandi,” 1713. 

Abr. de Moivre: “Doctrine of Chances,” 3d ed., 1756. 

Laplace: “Th6orie analytique des probabilit6s,” Oeuvres VII, 1886. 

L Bertrand: “Calcul des probabilit6s,” Paris, 1889. 

*. Czuber: “ Wahracheinlichkeitsrechnung,” 1, Leipzig, 1908. 

Whitworth: “Choice and Chance,” 5th ed., 1901. 

Castelnuovo: “Calcolo delle probability,” vol. 1, Bologna, 1925. 



CHAPTER III 


REPEATED TRIALS 

1. In the theory of probability the word “triar^ means an attempt to 
produce, in a manner precisely described, an event E which is not certain. 
The outcome of a trial is called a ‘'success'^ if E occurs, and a “failure'^ if 
E fails to occur. For instance, if E represents the drawing of two cards 
of the same denomination from a full pack of cards, the *‘triar' consists 
in taking any two cards from the full pack, and we have a success or 
failure in this trial according to whether both cards are of the same 
denomination or not. 

If trials can be repeated, they form a “ series^^ of trials. Regarding 
series of trials, the following two problems naturally arise: 

a. What is the probability of a given number of successes in a given 
series of trials? And as a generalization of this problem: 

b. What is the probability that the number of successes will be 
contained between two ^ven limits in a given series of trials? 

Problems of this kind are among the most important in the theory of 
probability. 

2. Trials are said to be * independent^ ^ in regard to an event E if 
I the probability of this event in any trial remains the same, whether 

the results of any number of other trials are known or not. On the other 
hand, trials are “dependent^^ if the probability of E in a certain trial 
varies according to the information we have about the outcome of one or 
more of the other trials. 

x/As an example of independent trials, imagine that several times in 
succession we draw one ball from an urn containing white and black ball|| 
in given proportion, after each trial returning the ball that has been 
drawn, and thoroughly mixing the balls before proceeding to the next 
trial. With respect to the color of the balls taken, we may reasonably 
assume that these trials are independent. On the other hand, if the 
bails already extracted are not returned to the urn, the above described 
triala are no longer independent. To illustrate, suppose that the urn 
from which the balls are drawn, originally contained 2 white and 3 black 
baUs> and that 4 balls are drawn. What is the probability that the 
third ball is white? If nothing is known about the color of the three 
other balls, the probability is If we know that the first ball is white, 
but the colors of the second and fourth balls are unknown, this proba¬ 
bility is 34* In general, the probability for any ball to be white (or black) 
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depends essentially on the amount of information we possess about the 
color of the other balls. Since the urn contains a limited number of 
balls, series of trials of this kind cannot be continued indefinitely. 

As an example of an indefinite series of dependent trials, suppose that 
we have two urns, the first containing 1 white and 2 black balls, and the 
second, 1 black and 2 white balls, and the trials consist in taking one 
ball at a time from either urn, observing the following rules: (a) the 
first ball is taken from the first urn; (h) after a white ball, the next is 
, taken from the first urn; after a black one, the next is taken from the 
\econd urn; (c) balls are returned to the same urns from which they were 
taken. 

Following these rules, we evidently have a definite series of trials, 
which can be extended indefinitely, and these trials are dependent. 
For if we know that a certain ball was white or black, the probability 
of the next ball being white is 34 or respectively. 

Assuming the independence of trials, the probability of an event E 
may remain constant or may vary from one trial to another. If an 
unbiased coin is tossed several times, we ha“ve a series of independent 
trials each with the same probability, 34> ^or heads. It is easy to give 
an example of a series of independent trials with variable probability for 
the same event. Imagine, for instance, that we have an unlimited 
number of urns with white and black balls, but that the proportion of 
white and black balls varies from urn to urn. One ball is drawn suc¬ 
cessively from each of these urns. Evidently, here we have a series of 
trials independent in regard to the white color of the ball drawn, but 
with the probability of drawing a ball of this color varying from trial to 
trial. 

In this chapter we shall discuss the simplest case of series of inde¬ 
pendent trials with constant probability. They are often called ‘*Ber- 
noullian series of trials” in honor of Jacob Bernoulli who, in his classical 
book, ‘*Ars conjectandi” (1713) made a profound study of such series 
and was led to the discovery of one of the most important theorems in 
the theory of probability. 

3. Considering a series of n independent trials in which the probability 
of an event F is p in every trial (that of the opposite event F being 
^ = 1 — the first problem which presents itself is to find the proba¬ 
bility that E will occur exactly m times, where m is one of the numbers 
0, 1, 2, . . . n. In what follows, we shall denote this probability by Tm- 
In the extreme cases m — n and m = 0 it is easy to find T* and To. 
When m = rif the event E must occur n times in succession, so that Tn 
represents the probability of the compound event SEE . . . F with n 
identical components. These components are independent events, since 
the trials are indejpendent, and the probability of each of them is p. 
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Hence, the compound probability is 


or 


Tn — p ' p ' p ' * • p {n times) 
Tn = p". 


The symbol Tq denotes the probability that E will never occur in n 
trials, which is the same as to say that F will occur n times in succession. 
Hence, for the same reasons as before, 


To = g” = (1 - py. 

When m is neither 0 nor n, the event consisting in m occurrences of E 
can materialize in several mutually exclusive forms, each of which may 
be represented by a definite succession of m letters E and n — m letters F. 
For example, if n = 4 and w = 2, we can distinguish the following mutu¬ 
ally exclusive forms corresponding to two occurrences of E: 

EEFF, EFEF, EFFE, FEEF, FEFE, FFEE, 

To find the number of all the different successions consisting of m 
letters E and n — m letters F, we observe that any such succession is 
determined as soon as we know the places occupied by the letter E, 
Now the number of ways to select m places out of the total number of 
n places is evidently the number of combinations out of n objects taken 
m at a time. Hence, the number of mutually exclusive ways to have 
m successes in n trials is 

^ n(n - 1) • — (n - m + 1) 

1 - 2 - 3 • • • m 


The probability of each succession of m letters E and n — m letters F, 
by reason of independence of trials, is represented by the product of 
m factors p and n — m factors q, and since the product does not depend 
upon the order of factors, this probability will be 




for each succession. Hence, the total probability of m successes in n 
trials is given by this simple formula: 


( 1 ) 




n(n — 1) 


(n — wi + 1) 


1-2S 


m 


ptnqn 


which can also be presented thus: 
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This second form can be used even for m = 0 or m = n if, as usual, 
we assume 0! = 1. Either of the expressions (1) or (2) shows that 
may be considered as the coeflSicient of t”* in the expansion of 

(q + pty 

according to ascending powers of an arbitrary variable t. In other 
words, we have identically 

(q + vty = n + + • • • + Tnt\ 

For this reason the function 

{q + pty 

is called the generating function^^ of probabilities To, Ti, T 2 , . . . T». 
By setting i = 1 we naturally obtain 

To + Ti + T 2 + • • • + = 1. 

4. The probability P(k^ 1) that the number of successes m will satisfy 
the inequalities (or, simply, the probability of these inequalities) 

k S m ^ I 

where k and I are two given integers, can easily be found by distinguishing 
the following mutually exclusive events: 

m = k or m = fc + 1, . . . or m = 1. 

Accordingly, by the theorem of total probability, 

P(fc, Z) - T* + + • • • -f 

or, using expression (2), 

i 

m^k 

In particular, the probability that the number of successes will not 
be greater than I is represented by the sum 

P(0, Z) = + ypr-* + + • • • + 

n{n - 1) • • • (w - Z + 

■ 1 • 2 • • • Z ^ ® • 

Similarly, the probability that the number of successes in n trials will 
not be less than I can be presented thus: 
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P(l. n) 


n(n — 1) 


(n — t + 1) 


1-2 


I 


(n - t)(n - 


n-l y , 

l+lq^ 


I - l) i 

(I + l)(i + 2) 


^ + 


where the series in the brackets ends by itself. 

6. The application of the above established formulas to numerical 
examples does not present any difficulty so long as the numbers with 
which we have to deal are not large. 


Example 1. In tossing 10 coins, what is the probability of having exactly 5 heads? 
Tossing 10 different coins at once is the same thing as tossing one coin 10 times, if all 
the coins are unbiased, which is assumed. Hence, the required probability is given 


by formula (1), where we must take n - 10, m =» 5, p = g ~ H it is 


tV 


10 • 9 » 8 • 7 ♦ 6 
1-2-3-4-5 


210 


252 

1024 


= 0.24609. 




Example 2. If a person playing a certain game can win $1 with the probability 
and lose twenty-five cents with the probability %, what is the probability of win¬ 
ning at least 13 in 20 games? Let m be the number of times the game is won. The 
total gain (conside^g a loss as a negative gain) will be 


m — J(20 — m) « fm — 6 dollars 

and the condition of the problem requires that it should not be less than 13. Hence 

f m - 6 ^ 3^ 

whence m ^ or, since m is an integer, m ^ 7. That is, in 20 trials an event with 
the probability ^ must happen at least 7 times and the probability for that is: 


2 

m—7 


20 ! 


m!(20 - m)!\3, 


This sum contains 14 terms; but it can be expressed through another sum containing 
only 7 terms, because 

20 6 

2 20 ! /iY/ 2 Y®^ 20 ! /iY/ 2 Y“""‘ 

m!(20 - m)!\3/ \3/ “ ^m!(20 - m)!\3/ \3/ 

m«»7 »n—0 

Using the last expression, one easily gets 0.5207 for the required probability. 

6. In the series of probabilities 

To, Tit Tzj • ' • Tn 

for 0, 1, 2, ... n successes in n trials, the terms generally increase till 
the greatest term T„ is reached, and then they steadily decrease. For 
instance, if n = 10, p = g = the values of the expression 

2^^Ttn 


for m = 0, 1, 2, ... 10 are 
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1, 10, 45, 120, 210, 252, 210, 120, 45, 10, 1 
so that Ts is the greatest term. For obvious reasons the number /x (to 
which the greatest term in the series of probabilities To, Ti, . . . Tn 
corresponds) is called the “most probable^^ number of successes. 

To prove this observation in general, and to find the rule for obtaining 
M, we observe first that the quotient ^ ^ 

rr 

Tm-n ^ n — m p 
Tm m + 1 gr 


decreases with increasing m, so that 



The two extreme terms in (a) are 

li = ^ ^ L. 

To q * Tn-i nq 


and if n is large enough, the first of them'is,> 1 and the last < 1. 
find exactly how large n must be, we notice that 


if 

whence 

Similarly, 

if 

whence 



np > g = 1 — p 


n + l>l- 
V 



p < nq or 1 — g < 

n + I >-• 

Q 


To 


Consequently, if n + 1 is greater than both 1/p and 1/g, the first term 
in (a) is > 1 and the last term is < 1. As the terms of (a) form a decreas¬ 
ing sequence, there must be a last term which is ^1. Let it be 




Then 
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and 





> • • 


• > 


Tn 


or, which is the same, 


To < Ti < T 2 < • •• 

> T ^+2 > • •• >Tn. 


In other words, the sequence of probabilities increases till the greatest 
term is reached and steadily decreases from then on. Besides 
there may be another greatest term Tm-iJ namely, when = ir„; 
but all the other t(!rms are certainly less than The number ^ is 
perfectly determined by the conditions 


Jk. = ” ~ M + 1 p > , Tj^ ^ W - M P 
T^-i n q ’ I’m n + iq 
which are equivalent to the two inequalities 


(n + l)p ^ m(p + ?), np — q < ix{p + g). 


These in turn can be presented thus: 

M ^ (w + l)p < M + 1 


and snow that n is uniquely determined as the greatest integer contained in 
{n 4 - l)p. If (n + l)p is an integer, then /x = (n + l)p and T,* *= T^^i. 
That is, there are two greatest terms if, and only if, (n + l)p is an 
integer. 

Let us consider now what happens if 


n + 1 g 


or 


n + 1 ^ 

<1 


In the first case, all the terms in (a) are less than 1 with the single excep¬ 
tion of the first term T\/To which may be equal to 1; namely, when 

n + 1 = -• Consequently, 


To^ Ti> T2> 


> Tn 


so that To is the greatest term. If (n + l)p < 1 the greatest integer 
contained in (n + l)p is 0, and there is only one greatest term To. If, 
however, (n + l)p = 1, there are two terms To = Ti greater than 
others. 

If (n -f l)g ^ 1, all the terms in series (o) are > 1 with the exception 
of the last term, which may be equal to 1; namely, when (n + l)g = 1. 
Hence, 


To < Tx < • • . < Tn-l ^ Tn 
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so that Tn is the greatest term, and the preceding term Tn-i can be equal 
to it only if (n + l)g' = 1. Now the condition 


is equivalent to 


(n + 1)(7 ^ 1 
(n 4* l)p ^ n. 


On the other hand, because p < 1, 

(n + l)p < n + 1. 

Therefore n is the greatest integer contained in (n -f l)p. 

Comparing the results obtained in the last two cases (excluded at 
first) with the general rule, we see that in all cases the greatest term 
Tft. corresponds to 

M = [(n + l)p]. 


If (n -f- l)p is an integer, then there are two greatest terms and T„_i. 
This rule for determining the most probable number of successes is very 
simple and easy of application to numerical examples. 

Example 1. Letn = 20, p =* Then (n + l)p = 8.4, and the greatest 

integer contained in this number is m =8 . Hence, there is only one most probable 
number of successes p = 8 with the corresponding probability 


r, 


20 ! 

8!12!\5/ \5/ 


= 0.1797. 


Example 2. Let n = 110, p = ^, q = Ht and (n 4 l)p = 37, an integer 
Consequently, 36 and 37 are the most probable numbers of successes with the corre¬ 
sponding probability 


Tzi — Tn 


110! / 1 YY 2 Y' 

37!73!\3/ W 


0.0801. 


7. When n, w, and n — m are large numbers, the evaluation of 
probability by the exact formula 

^ n! 

rp _ ___ 

"■ m!(n - m)r ^ 

becomes impracticable and it is necessary to resort to approximations. 
For approximate evaluation of large factorials we possess precious means 
in the famous “Stirling formula.” Referring the reader to Appendix I 
where this formula is established, we shall use it here in the following 
form: 

log xl = log\/2irx 4- X log X — X 4- a>(a:) 


where 


12(x + i)^ 12x' 
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In the same appendix the following double inequality is proved; 

ifc -- A < < mVe - i2;;r+6 " 

1 

121 -f 6' 

Now from Stirling’s formula 

n\ = \/2irn 

and two similar expressions for m\ and (n — m)\ follow. Substituting 
them into Tm, we get two limits 

® )'(y^r’ 

> s/ssir=^)(f 

where 

1 1 1 
k = e^2n+6 12m+6 12(n-m)+6 

J_1_ 1 

I = gl2n 12m 12(n — m)^ 

When n, m, n — m are even moderately large k and I differ little from 
each other. 

Inequalities (3) and (4) then give very close upper and lower limits 
for Tm. To evaluate powers 

/ nq Y"”* 

\m / \n — m) 

with large exponents, sufficiently extensive logarithmic tables must be 
available. If such tables are lacking, then in cases which ordinarily 
occur when ratios np/m and nq/{n — m) are close to 1, we can use 
special short tables to evaluate logarithms of these ratios or else resort to 
series. 

8. Another problem requiring the probability that the number of 
successes will be contained between two given limits is much more 
complex in case the number of trials as well as the difference between 
given limits is a large number. Ordinarily for approximate evaluation 
of probability under such circumstances simple and convenient formulas 
are used. These formulas are derived in Chap. VII. Less known is 
the ingenious use by Markoff of continued fractions for that purpose. 

It suffices to devise a method for approximate evaluation of the 
probability that the number of successes will be greater than a given 
integer I which can be supposed > np. We shall denote this probability by 
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P(f). A similar notation Q{1) will be used to denote the probability 
that the number of failures is >Z where again I > nq. The probability 
P(JCj 1) of the inequalities k ^ m ^ I can be expressed as follows: • 

P(k, Z) = 1 - P(Z) - Q(n - k) 

if I > np and k < np] 

P(k, 1) = P(fc - 1) - P(Z) 


if both k and Z are > np\ and finally 

P(fc, Z) = Q(n ~ Z - 1) - Q(n - k) 

if both k and Z are < np. 

For P(Z) we have the expression 

n\ 


P(l) = 


(Z + l)!(n - Z - l)l 






1 + 2 q 
- l)(7i -1-2)/pV 

^ (J + 2)(/ + 3) \q; 


The first factor 


(Z + l)!(n-Z~l)r ^ 


can be approximately evaluated by the method of the preceding section. 
The whole difficulty resides in the evaluation of the sum 

^2 


S = 1 + 


Z 


_ 1 P , ^ _ 

Z -h 2 q'^ (Z + 2)(Z + 3) 


Z - l)(n - Z^ 


which is a particular case of the hypergeometric series 


a/3 ^ , a(a + l)/3(/3 -f 1) ^2 


F{a, fi,y,x) 1 + ^ j + 1) 


x* + 


In fact 




+ J + 1, 1, J + 2^ 




Now, owing to this connection between S and hypergeometric series, S 
can be represented in the form of a continued fraction. First, it is 
easy to establish the following relations: 

P(a, 0 + 1, 7 + 1, X) = P(a, iS, y,x) + 

+ xf^f^Fia+l,fi+l.y + 2,x) 

Fift +l,fi,y + l,x)= F(a, p, y, x) + 
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Substituting a + n, jS + n, 7 + 2 n and a + n, /3 + n + 1, 7 + 2 n + 1, 
respectively, for a, 7 in these relations and setting 

— F{a + n, + w, 7 + 2 n, z); 

X2n+i = F{a + w + 1, 7 + 2n + If 
^ _ (/3 -f n)(7 - Of + n) . ^ _ (« + w)(7 - iS + n) 

“** (7 + 2n)(7 + 2n- D’ (7 + 2n)(7 + 2n + 1 ) 

for brevity, we have 

Xq = Xi — 01^X2 

Xi = X 2 — (i2zX2 


whence 


Xtn—l — Xm (ImZXwi+l 


Xo 


. aix 
1 -^ a 2 Z 

1 - 


Qm-lX 
1 - 


amZ 


In our particular case 

Xi = F{--n + I + 1, 1, I + 2, x), Xo = 1 
and (i2n—2i—\ ~ 0 * 

T) 

Hence, taking ^ introducing new notations, we have a 

finite continued fraction 
1 


( 5 ) 


S = 


1 - Y . ^ 

1 + Y _ £s 

1 + 


Cn-l-l 


1 + 


dn-t -1 


where 

y + 2 fc - l)y + 2 k)q* 


A;(n + k)p 

{I + 2 k) Q + 2 k + l)q 


Every one of the numbers c* will be positive and < 1 if this is true for 
Cl. Now 

_ (n - Z ~ l)p 
" (i + 2 )q < ^ 

if I > np, and that is exactly what we suppose. The above continued 
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fraction can be used to obtain approximate values of S in excess or in 
defect, as we please. Let us denote the continued fraction 


1 



Ck+l 

1 + • . 


by«*. Then 

0 < a)fc < Cfc, 

which can be easily verified. Furthermore, 



__ Cl 
1 + 



= T 


Ci 

1 + 


1 — 0)8 


and in general 


«* = 


1 + 


dk 


1 — 0 ) 44-1 


Having selected fc, depending on the degree of approximation we 
desire in the final result (but never too large; A; = 5 or less generally 
suffices), we use the inequality 


0 < O)*^.! < Ck+l 

to obtain two limits in defect and in excess for o)*. Using these limits, we 
obtain similar limits for o)*_i, o)jfc_ 2 , o)*_8, . . . and, finally, for o)i and S. 
The series of operations will be better illustrated by an example. 

9. Let us find approximately the probability that in 9,000 trials an 
event with the probability p — }'i will occur not more than 3,090 times 
and not less than 2,910 times. To this end we must first seek the 
probability of more than 3,090 occurrences, which involves, in the first 
place, the evaluation of 

_ 9000! 

309115909! \3/ \3/ 

By using inequalities (3) and (4) of Sec. 7, we find 
0.011286 < Tson < 0.011287. 


Next we turn to the continued fraction to evaluate the sum S. The 
following table gives approximate values of Ci, C 2 , . . . Ce and di, d% ... dt 
to 5 decimals and less than the exact numbers 
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n 


dn 

1 

0.95553 

0.00047 

2 

0.95444 

0.00094 

3 

0.95335 

0.00140 

4 

0.95227 

0.00187 

5 

0.95119 

0.00234 

6 

0.95010 



We start with the inequalities 

0 < «« < 0.96011 


and then proceed as follows: 


1.00234 < 1 -h 
1.02041 < 1 + 
1.01716 < 1 + 
1.01416 < 1 + 
1.00785 < 1 + 


1 — We 

d, 

1 — We 
dz 

1 -- W4 

d, 

1 — Wj 

di 

1 — W2 


< 1.04711; 0.90839 < we < 0.94898 

< 1.03685; 0.91842 < W4 < 0.93324 

< 1.02113; 0.93362 < w, < 0.93728 

< 1.01514; 0.94020 < wa < 0.94113 

< 1.00816; 0.94779 < wi < 0.94810 


1 


<S < 


0.05221 " " 0.05190 

0.02161 < iSTsoai < 0.02175. 


Hence, we know for certain that 

0.02161 < P(3,090) < 0.02176. 

By a similar calculation it was found that 

0.02129 < Q(6,090) < 0.02142, 

so that 

0.04290 < P(3,090) + Q(6,090) < 0.04317. 

The required probability P that the number of successes will be contained 
between 2,910 and 3,090 (limits included) lies between 0.95683 and 
0.95710 so that, taking P = 0.9570, the error in absolute value will be 
less than 1.7 X 10“^. 4 


Problems for Solution 

1. What is the probability of having 12 three times in 100 tosses of 2 dice? 

Am, - 0.2257. 
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2. What is the probability for an event E to occur at least once, or twice, or three 
times, in a series of n independent trials with the probability p7 Ans. 

(a) 1 - (1 - p)«; (6) 1 - (1 - + (n ~ l)p]; 

(c) 1 - (1 - + (n - 2)p + ~ ~ 

3. What is the probability of having 12 points with 2 dice at least three times in 

100 throws? Ans. 0.528. 

4. In a series of 100 independent trials with the probability what is the most 
probable number of successes and its probability? Ans. /i = 33; Tsa = 0.0844. 

Note: Log 100! = 157.97000; Log 67! = 94.56195; Log 33! = 36.93869. 

6. A player wins $1 if he throws heads two times in succession; otherwise he loses 
25 cents. If this game is repeated 100 times, what is the probability that neither his 
gain nor loss will exceed $1? Or $5? Ans. 



(b) 


(a) 


100! /iVY^Y^ 

20!80!\4/ \4/ 


0.0493; 


100! /lYYaVY 80 80 - 79 80 - 79 - 78 80 - 79 - 78 - 77 

20! 80!\4/ \4/ L 63 63 - 66 63 - 66 - 69 63 • 66 - 69 • 72 

60 60 - 57 60 - 57 - 54 60 - 57 - 54 - 51] 

81 81 - 82 81 - 82 • 83 81 - 82 - 83 - 84 J 


+ 


= 0.4506 


Note: Log 20! - 18.38612; Log 80! = 118.85473. 

6 . Show that in a series of 2s trials with the probability 3^ the most probable num¬ 
ber of successes is s and the corresponding probability 


Show also that 


Hint: 


T. « 


T. < 


T, < 


1 - 3 - 5 - - - (2« - 1) 


2-4-6 

1 


2s 


+ 1 

2-4-6 • • • 2» 


3-5-7 


( 2 » + 1 ) 


7. Prove the following theorem: If P and P' are probabilities of the most probable 
number of successes, respectively, in n and n + 1 trials, then P' ^ P, the equality 
sign being excluded unless (n -f- l)p is an integer. 

8. Show that the probability Tn corresponding to the most probable number or 
successes in n trials, is asymptotic to (2impg)~H, that is, 

lim Tn\/2m'pq = 1 as n —» «. 

9. When p = 3^, the following inequality holds for every m: 

„ ^ /y e-'’_ _ 

"a* 

n 



m 



if 
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10. What is the probability of 215 successes in 1,000 trials if p « 

Ans. 0.0154. 

11. What is the probability that in 2,000 trials the number of successes will be 

contained between 460 and 540 (limits included) if p = Ans. 0.964. 

12. Two players A and B agree to play until one of them wins a certain number of 
games, the probabilities for A and B to win a single game being p and 9 = 1 ~ p. 
However, they are forced to quit when A has o games still to win, and B has h games. 
How should they divide their total stake to be fair? 

This problem is known as “probl^me de parties,^* one of the first problems on 
probability discussed and solved by Fermat and Pascal in their correspondence. 

Solution 1 . Let P denote the probability that A will win a remaining games before 
B can win h games, and let Q = 1 — P denote the probability for B to win h games 
before A wins a games. To be fair, the players must divide their common stake M in 
the ratio P:Q and leave the sum MP to A and the sum MQ to B. 

To find P, notice that A wins in the following mutually exclusive ways: 

a. If he wins in exactly a games; probability p“. 


h. If he wins in exactly o + 1 games; probability -p“g. 


c. If he wins in exactly 0+2 games; probability 


a(a + 1 ) 

1 -2 


p“g*. 


n. If he wins in exactly a + 6 — 1 games; probability 
o(a + 1) • ' • (o + — 2) 


1-2-3 


ib - 1 ) 




Consequently 
P 

and similarly 

Q 


+ 1.2 g +•••+- 


a(o + 1) 


(o + & — 2) 


1-2 




^6(6 + 1) 

.p4.-^p.+ 


• + 


6(6 + 1 ) 


(6 - 1 ) 


(6+0-2) 




1-2 


(O - 1) 




Show directly that P + Q = 1 . 

0 . 


XT dP dQ 
Hint: -t- + 3- 
ap ap 


Solution 2 . The same problem can be solved in a different way. Whether A or B 
wins will be decided in not more than o + 6 — 1 games. Now if the players continue 
to play until the number of games reaches the limit o + 6 — 1 , the number of games 
won by A must be not less than a. And conversely, if this number is not less than a, A 
will win o games before B wins 6 games. Therefore, P is the probability that in 
a + 6 — 1 games A wins not less than a times, or 


(g+6-1) ! 

al (6 - 1)1 


L ^a + lg^(o + l)(a+ 2 )\g/ 


^)(^2)/_pY 

+ l)(a+2)\g/ ^ ^ 

(6 - 1)(6 - 2 ) • • . 2 • 1 


(a + l)(o + 2 ) 

Show directly that both expressions for P are identical. 


(a + 6 - l)\g, 


']■ 




Hint: Proceed as before. 
13. Prove the identity 
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pn ^ -f - - ” 2 ^ + 


, n(n - 1) • (n - A; + 1) ^ 

+-r-;r-7^-r- p" V = 


1 


a 


_ x)^dx 


— x)*dx 


Hint: Take derivatives with respect to p. 

14. A and B have, respectively, n + 1 and n coins. If they toss their coins 
simultaneously, what is the probability that (a) A will have more heads than B? 
(6) A and B will have an eciual number of heads? (c) B will have more heads than A ? 

Solution, a. Lrfjt Pn be the probability for A to have more heads than B, This 
probability can be expressed as the double sum 


Pn 


n f 1 n 


-2^.2 2 




X =>1 a » 0 


Considering the coefficient of <* in 


we have 


(1 + 1 )'*' 



(1 +<)»•+' 
- 1 




Hence 


n + l 


1 2 *" 1 

P — __ > / 7 n+* _ 

” 2 *"'*’* 2 

X = 1 

b. The probability Qn for A and B to have an equal number of heads is 

n 


fn 1 

’ /n>at yya _ 2rH-l ^ 


c. The probability Rn for B to have more heads than A is 

1 r*’* 

2 2 *"+* 


16. If each of n independent trials can result in one of the m incompatible events 
Elf Eif . . . Em with the respective probabilities 

Pi, P2, . . . Vm\ (pi + P2 + • • • -f P« = 1), 
show that the probability to have U events E\fU events P2, . . . Im events Em where 
+ fa + • • • + fm = w, is given by 



CHAPTER IV 


PROBABILITIES OF HYPOTHESES AND BAYES’ THEOREM 

1. The nature o f the pro b l ems w ith which we deal ifiLthis chapter laay 
b^illustrated.bx the following simple example: Urns 1 and 2 contain, 

^ black balli^ and 4_white and 1 black balls. 
One of thejyyrns is selected at random and one ball is drawn. It happens 
tol5e white. What is the probability that it came from the first urn? 
Bef ore th e ball was drawn and its color revealec^ the probability that_the 
first urn would be chosen had been 1/2; but the indication of the color 
of^ theTbalT-tEat was drawn altered this probabili^. To find this neF 
probability, the following artifice can be used: 

Imagine that J)alls from both urns are put together in a third urn. 
To distinguish their origin, balls from the first urn are marked with 1 
and those from the second urn are marked with 2. Since there are 5 
balls marked with 1 and the same number marked with 2, in taking one 
ball from the third urn we have equal chances to take one coming from 
either the first or the second urn, and the situation is exactly the same 
as if we chose one of the uriis at random arid drew one ball from it. 
I f th^ baU djywn fronr the third urn happens to be \ ^te ^ this can happen 
iii^+ 4 = 6 equally likely cases. Only in 2 of^these cases will the 
extracted ball have the mark 1. Hence, the probability that the white 
ball came from the first urn is % = 

The success of this artifice depends on the equality of the number of 
balls in both urns. It can be applied to the case pLan ujieq^^ 
of balls in th e urns , but with some_.mpdiJB.C.fttipns; however,^ it seems 
preferable to follow a reg u lar me thod for solving problems like the 
piiwedmg one. w--" 

2. The problem just solved is a particular case of the following funda- 
mental: 

Problem 1. An event A can occur only if one of the set of exhaustive 
and incompatible events 


Bi, Btj ... Bn 

occurs. The probabilities of these events 

(Bi), (B,), . . . (Bn) 

corresponding to the total absence of any knowledge as to the occurrence 

60 
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or nonoccurrence of A, are known. Known also, are the conditional 
probabilities 

{A, Bt); t - 1, 2, ... n 


for A to occur, assuming the occurrence of Bi, How does t he prob a¬ 
bility of .g,- chan ge with the add iti onal j nfonnation that A has actually 
happened? 

Solution. The question amounts to finding the conditional proba¬ 
bility (5i, A). The probability of the compound event ABi can be 
presented in two forms 


oi- » . ' f . - 

, ('ABd = (AKB^I ^). 




Equating the right-hand members, we derive the following expression 
for the unknown probability (B,-, i4): 


(B., A) 


(B.)(4^B0 


Since the event A can materialize in the mutually exclusive forms 


ABi, AB2 j . . . ABnj 

by applying the theorem of total probability, we get 

(A) = (Hi)(A, Hi) + H 2 ) + • • • + (H.)(A, H„). 

It suffices now to introduce this expression into the preceding formula for 
(H», A) to get the final expression 

/i\ /D 4 ^ =_(Hi )(A, Bj) _ 

^ ^ (^i)(A, Hi) 4- (H,)(A, H.)> • • • + (H„)(A, B^ 

This formula, when described in words, constitutes the so-called 
“Bayes’ theorem.^’ However, it is hardly necessary to describe its 
content in words; symbols speak better for themselves.,. For that 
reason^ we prefer to speak of Bayes^ formula rather than of Bayes’ 
theorem. Bayes’ formula is also known as the “f ormula for probabilitie s 
of hypoth fisgs.” The reason for that name is that the events Hi, H 2 , . . . 
H„ may be considered as hypotheses to account for the occurrence of A. 
It is customary to speak of probabilities 

(Hi), (H 2 ), . . . (H.) 
as a priori probabilities of hypot heses 

Hi, H?, . . . H„, ^ 

while probabilities ! 

(Hi, A); t = 1, 2, . . . n 
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ye called a p osteriori probabilities of the same hypotheses. 

3. A few examples will help us to understand the meaning and the 
use of Bayes’ formula. 


Example 1. The contents of urns 1, 2, 3, are as follows: 

C 1 white, 2 black, 3 red balls * ^ 

L 2 white, 1 black, 1 red balls ^ 

^ 4 white, 6 black, 3 red balls 

One urn is chosen at random and two balls drawn. They happen to be white and red. 

What is the probability that they came from urn 2 or 3? 

^ Solution. The eve nt A repr esents the fact that two balls taken from the selected 
urn ' were of w hite and red colw, respectively. To account for this fact, we have three 
hypotheses: The selected um was 1 or 2 or 3. We shall represent these hypotheses in 
the order indicated by Bi, Bj, Ba. Since nothing distinguishes the urns, the probabili¬ 
ties of these hypotheses before anything was known about A are 


(BO = (BO = (Ba) = i. 


The probabilities of A, fuming these hypotheses, are 

(^^0 = i, = i; (AjBt) = 

It remains now to introduce these values into formula (1) to have a posteriori prob- 



Example 2. It is known that an urn containing altogether 10 balls was filled in 
the following manner: A coin was tossed 10 times, and according as it showed heads 
or tails, one white or one black ball was put into the urn. Balls are drawn from this 
urn one at a time, 10 times in succession Galways being returned before the next draw¬ 
ing) and every one turns out to be white. What is the probability that the um con¬ 
tains nothing but white balls? 

Solution. The event A consists in the fact that in 10 independent trials with a 
definite but unknown probability, only white balls appear. To accoimt for this fact, 
we have 10 hypotheses regarding the number of white balls in the urn; namely, that 
this number is either 1, or 2, or 3, . . .or 10. The a priori probability of the hypo¬ 
thesis Bi that there are exactly i white balls in the um, according to the manner in 
which the um was filled, is the same as the probability of having i heads in 10 throws; 
that is. 


m 


10 ! 1 
t !(10 - 


t = 1, 2, . . 


10 . 


Granted the hypothesis Bt, the probability of A is 
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The problem requires us to find (Bio, A), The expression of this probability immedi' 
ately results from Bayes’ formula: 


(Bio, A) 



The denominator of this fraction is 


Hence 


14.247. 

(Bio, A) = 0.0702. 


This probability, although still small, is much greater than Ho 24 f the a priori prob¬ 
ability of having only white balls in the urn. 

If, instead of 10 drawings, m drawings have been made and at each drawing white 
balls appeared, the probability (Bio, A) would be given by 


(Bio, A) 


1 

10 


The denominator of this formula can be presented thus: 


Now 


and so 


Hence 


10 

24-b)' 

«-0 

\ mi 

■-s) 

10 10 

24-5)'< 2 '"”*"™ 

t -0 «»0 


(B.., A) > (1 + « '®) . 

This shows that with increasing m the probability (Bio, A) rapidly approaches 1. 
For instance, if m = 100 

(Bio, i4) > (1 + > (1.0000464)-w > 0.99954. 


Thus, after 100 drawings producing only white balls, it is almost certain that the 
urn contains nothing but white balls—a conclusion which mere common sense would 
dictate. 

Example 3. Two urns, 1 and 2, contain respectively 2 white and 1 black ball, 
and 1 white and 5 black balls. One ball is transferred from urn 1 to urn 2 and then 
one ball is drawn from the latter. It happens to be white. What is the probability 
that the transferred ball was black? 
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Solution. Here we have two hypotheses: Bi, that the transferred ball was black, 
and Bit that it was white. The a priori probabilities of these hypotheses are 

, (BO = I, (B,) = I- 

The probabilities of drawing a white ball from urn 2, granted that Bi or Bs is true, 
are: 

(A, BO = 4, (A, BO = f- 

The probability of Bi, after a white ball has been drawn from the second um, 
results from Bayes’ formula: 


(Bi, A) 


^ 1 

i l + ! • f 5' 


4. Problem 2. Retaining the notations, conditions, and data of 
Prob. 1, find the probability of materialization of another event C 
granted that A has actually occurred. Conditional probabilities 


(C, ABi); t = 1, 2, . . . n 


are supposed to be known. 

Solution. Since the fact of the occurrence of A involves that of one, 
and only one, of the events 


Bif . . . Bjty 


the event C (granted the occurrence of A) can materialize in the following 
mutually exclusive forms 


CHi, CJ5*, . . . CJ5„. 

Consequently, the probability (C, A) which we are seeking is given by 
(C, A) = (CBi, A) + (CRj, A) + • • • -f (CBn, A). 


Applying the theorem of compound probability, we have 
(CR,, A) = (B,, A)((7, B,A) 

and 


(C, A) = (Bi, A)(C, ABi) -f (B2, A)(C, AB*) + • • • + 

(B„, A)(C, ABn). 

It suffices now to substitute for 

(Bi, A) 

its expression given by Bayes’ formula, to find the final expression 

X(B,KA, B,KC, ABt) 

m iC, A) = - 

X{Bd{A,Bd 
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or, canceling Cj, 


Craip?(l ~ 
k 

5) C?aip7(l - 

t-l 

a<p?(l — 

Jb 

t-l 


Now, applying the theorem of total probability, the probability P of the 
inequalities 


will be given by 


ct ^ V ^ ^ 


(4) 


D _ 2a<p7(l - piY * 

^ k 

]£ - p<)“- 


where the summation in the numerator refers to all values of p< lying 
between a and /3, limits included. 

An important particular case arises when the set of hypothetical 
probabilities is 


Pi 




• • p* = 1 


and the a priori probabilities of these hypotheses are equal: 

1 

ai = aa = an — 


Then the fraction \/k can be canceled in both numerator and denomina¬ 
tor. The final formula for the probability of the inequalities 


will be 
(5) 


P = 


« ^ p ^ jS 

Sp7(l — piY-”* 

__ 

5)p 7(1 - p,)"-”* 

t-l 


summation in numerator being extended over all positive integers i 
satisfying the inequalities 


ka ^ i ^ kfi. 

In the limit, when k tends to infinity, the a priori probability of the 
inequalities 


ot ^ p ^ 
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SolxUum. Here we have two hypotheses: Bi, that the transferred ball was black, 
and Bs, that it was white. The a priori probabilities of these hypotheses are 

^ ^ (Bx) = (B,) = !• 

The probabilities of drawing a white ball from urn 2, granted that Bi or Ba is true, 
are: 

(A, Bx) - 4, (A, Ba) = f- 

The probability of Bx, after a white ball has been drawn from the second urn, 
results from Bayes* formula: 

<»..«- rr+Ti - i 

4. Problem 2. Retaining the notations, conditions, and data of 
Prob. 1, find the probability of materialization of another event C 
granted that A has actually occurred. Conditional probabilities 

(C, ABi); t = 1, 2, . . . n 

are supposed to be known. 

Solution. Since the fact of the occurrence of A involves that of one, 
and only one, of the events 

Bi, B2, . . . B„j 

the event C (granted the occurrence of ^4) can materialize in the following 
mutually exclusive forms 

CBi, CB2, . . . CBn. 

Consequently, the probability (C, A) which we are seeking is given by 
(C, A) = (CBi, A) + (CB 2 , A) + - ‘ + (CB,, A). 

Appl 3 ring the theorem of compound probability, we have 
(CB„ A) = (B,, A)(C, B,A) 

and 

(C, A) = (Bi, A)(C, AB,) + (B 2 , A)(C, ABi) + • • • + 

(Bn, A){C, AB.l 

It suffices now to substitute for 

(Bi, A) 

its expression given by Bayes’ formula, to find the final expression 

BdiC, AB,) 

__ 

X(Bi)(A, B,) 

• •1 


( 3 ) 


(C,A) 
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or, canceling Cj, 


— Pi)"'"* 
k 

a<p7(l - Pi)”-" 
2^a<p?(l - Pi)*-* 


Now, applying the theorem of total probability, the probability P of the 
inequalities 


will be given by 


a ^ p ^ P 


p _ 2aip7(l — Pi)" *" 

^ k 

2^aiP7(l — pi)"~*" 


where the summation in the numerator refers to all values of pi lying 
between a and P, limits included. 

An important particular case arises when the set of hypothetical 
probabilities is 


Pi 




• • ' Pfc = 1 


and the a priori probabilities of these hypotheses are equal; 

1 

ai = 02 = * • • = a* = T- 


Then the fraction l/k can be canceled in both numerator and denomina¬ 
tor. The final formula for the probability of the inequalities 


will be 
(5) 


ct ^ p ^ P 

D _ 2p7(l - p,)"-"* 

^ ~ k 

^P7(1 - p,)"-*" 


summation in numerator being extended over all positive integers i 
satisfying the inequalities 


ka ^ i ^ kp. 


In the limit, when k tends to infinity, the a priori probability of the 
inequalities 


ot ^ p ^ P 
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is given simply by the length /? — a of the interval (a, p). The a pos¬ 
teriori probability of the same inequalities is obtained as the limit of 
expression (5). Now, as fc the sums 






t^iba 

tend to the definite integrals 

— xY~”^dx and 

Therefore, in the limit, the a posteriori probability of the inequalities 


ot ^ p ^ 

is expressed by the ratio of two definite integrals 

f^x^(l — xy-'^dx 

(6) P = - 

I — xY~”^dx 

This formula leads to the following conclusion: When the unknown 
probability p of an event E may have any value between 0 and 1 and the a 
priori probability of its being contained between limits a and p is p — a, 
then after n trials in which E occurred m times, the a posteriori probability 
of p being corUained between a and is given by formula (6). 

6. Problem 4. Assumptions and data being the same as in Prob. 3, 
find the probability that in ni trials, following n trials, which produced 
E m times, the same event will occur mi times. 

Solution. It suffices to take in formula (3) 

{Bi) = ai] (A, Bi) = C?pr(l ~ 

and 

(C, Bi) = CTrPril - 

to find for the required probability this expression: 

k 

(7) Q = - 

^ a<pP(l — Pi)"-*" 


Pi = 


1 



ai = aa = * * * 



1 


Supposing again 
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and letting fc —> oo formula (7) in the limit becomes 


( 8 ) 


. _o_ 


Q = c- 


■— xY~*^dz 


This formula leads to the following conclusion: When the unknovm 
probability p of an event E may have any value between limits 0 and 1 
and the a priori probability of its being contained between a and P is 
P — a (so that equal probabilities correspond to intervals of equal length) y 
the probability that the event E will happen mi times in rii trials following 
n trials which produced E m times is given by formula (8). 

In particular, for ni = mi = 1 (evaluating integrals by the known 
formula), we have 


m 4" 
n 4- 2 


This is the much disputed *'law of succession’^ established by Laplace. 

7. Bayes’ formula, and other conclusions derived from it, are neces¬ 
sary consequences of fundamental concepts and theorems of the theory of 
probability. Once we admit these fundamentals, we must admit Bayes’ 
formula and all that follows from it. 

But the question arises: When may the various results established 
in this chapter be legitimately applied? In general, they may be applied 
whenever all the conditions of their validity are fulfilled; and in some 
artificial theoretical problems like those considered in this chapter, they 
unquestionably are legitimately applied. But in the case of practical 
applications it is not easy to make sure that all the conditions of validity 
are fulfilled, though there are vsome practical problems in which the use 
of Bayes’ formula is perfectly legitimate.^ In the history of probability 
it has happened that even the most illustrious men, like Laplace and 
Poisson, went farther than they were entitled to go and made free use 
principally of formulas (6) and (8) in various important practical prob¬ 
lems. Against the indiscriminate use of these formulas sharp objections 
have been raised by a number of authors, especially in modern times. 

The first objection is of a general nature and hits the very existence 
of a priori probabilities. If an urn is given to us and we know only that 
it contains white and black balls, it is evident that no means are available 
to estimate a priori probabiliti(‘s of various hypotheses as to the propor¬ 
tion of white balls. Hence, critics say, a priori probabilities do not exist 
at all, and it is futile to attempt to apply Bayes’ formula to an urn with 
an unknown proportion of balls. At first this objection may appear 

* One such problem can be found in an excellent book by Thornton C. Fry, “Prob¬ 
ability and Its Engineering Uses,” New York, 1928. 
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very convincing, but its force is somewhat lessened by considering the 
peculiar mode of existence of mathematical objects. 

Some property of integers, unknown to me, is not present in my 
mind, but it is hardly permissible to say that it does not exist; for it does 
exist in the minds of those who discover this property and know how to 
prove it. 

Similarly, our um might have been filled by some person, or selected 
from among urns with known contents. To this person the a priori 
probabilities of various proportions of white and black balls might 
have been known. To us they are unknown, but this should not prevent 
us from attributing to them some potential mode of existence at least as 
a sort of belief. 

To admit a belief in the existence of certain unknown numbers is 
common to all sciences where mathematical analysis is applied to the 
world of reality. If we are allowed to introduce the element of belief 
into such “exact^* sciences as astronomy and physics, it would be only 
fair to admit it in practical applications of probability. 

The second and very serious objection is directed against the use of 
formula (6), and for similar reasons against formula (8). Imagine, 
again, that we are provided with an urn containing an enormous number 
of white and black balls in completely unknown proportion. Our aim 
is to find the probability that the proportion of white balls to the total 
number of balls is contained between two given limits. To that end, we 
make a long series of trials as described in Prob. 5 and find that actually 
in n trials, white balls appeared m times. The probability we seek would 
result from Bayes^ formula, provided numerical values of a priori proba¬ 
bilities, assumed on belief to be existent, were known. Lacking such 
knowledge, an arbitrary assumption is madey namely, that all the a 
priori probabilities have the same value. Then, on account of the 
enormous number of balls in our urn, formula (6) can be used as an 
approximate expression of P. It can be shown that, given an arbitrary 
positive number €, however small, the probability of the inequalities 

- e < p < - + € 
n n 

can be made as near to 1 as we please by taking the number of trials 
greater than a certain number N{i) depending upon e alone. In other 
words, with practical certainty we can expect the proportion of white 
balls to the total number of balls in our um to be contained within 
arbitrarily narrow limits 

m j w I 

-€ and-he. 

n n 
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A conclusion like this would certainly be of the greatest importance. 
But it is vitiated by the arbitrary assumption made at the beginning. 
The same is true of formula (8) and of Laplace’s *‘law of succession.” 
The objection against using formulas (6) and (8) in circurpstances where 
we are not entitled to use them appears to us as irrefutable, and the 
numerical applications made by Laplace and others cannot inspire much 
confidence. 

As an example of the extremes to which the illegitimate use of formulas 
(6) and (8) may lead, we quote from Laplace: 

En faisant, par exemple, remonter la plus ancienne 4poque de Thistoire k 
cinq mille ans, ou k 1,826,213 jours, et le Soleil s’^tant lev4 constamment, dans 
cet intervalle, k chaque revolution de vingt-quatre heures, il y a 1,826,214 k parier 
contre un qu’il se levera encore demain. 

It appears strange that as great a man as Laplace could make such a 
statement in earnest. However, under proper conditions, it would 
not be so objectionable. If, from the enormous number A" + 1 of 
urns containing each N black and white balls in all possible proportions, 
one urn is taken and 1,826,213 balls are drawn and returned, and they 
all turn out to be white, then nobody can deny that there are very nearly 
1,826,214 chances against one that the next ball will also be white. 

Problems for Solution 

1 . Three urns of the same appearance have the following proportions of white and 
black balls: 

Urn 1: 1 white, 2 black balls 
Urn 2: 2 white, 1 black balU 
Urn 3: 2 white, 2 black balls 

One of the urns is selected and one ball is drawn. It turns out to be white. What 
is the probability that the third urn was chosen? Ans. 

2. Under the same conditions, what is the probability of drawing a white ball 

again, the first one not having been returned? Ans. 3^. 

3. An urn containing 5 balls has been filled up by taking 5 balls from another urn, 

which originally had 5 white and 5 black balls. A ball is taken from the first urn, and 
it happens to be black. What is the probability of drawing a white ball from among 
the remaining 4? Ans. %. 

4. From an urn containing 5 white and 5 black balls, 5 balls are transferred into an 
empty second urn. From there, 3 balls are t ransferred into an empty third u rn and, 
finally, one ball is drawn from th^latterr It turns out to be white . What is the 
probability that all 5 balls transferred from the first um arc white? Ans. 26* 

6. Conditions and notations being the same as in Prob. 3 (page 66 ), show that the 
probability for an event to occur in the (n + l) 8 t trial, granted that it has occurred 
in all the preceding n trials, is never less than the probability for the same event to 
occur in the nth trial, granting that it has occurred in the preceding n — 1 trials. 

Hint: it must be proved that 
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For that purpose, use Cauchy’s inequality 

/ * \* * * 

\t-l / t-1 t-1 

6 . Assuming that the unknown probability p of an event E can have any value 
between 0 and 1 and that the a priori probability of its being contained in the interval 
(a, /3) is equal to the length of this interval, prove the following theorem: The prob¬ 
ability a posteriori of the inequality 

p ^ <r 


after E has occurred m times in n trials is equal to the probability of at least m -f- 1 
successes in n -h 1 independent trials with constant probability <r. (See Prob. 13, 
page 69.) 

7. Assumptions being the same as in the preceding problem, find approximately 
the probability a posteriori of the inequalities 


AV ^ P ^ UH, 

it being known that in 200 trials an event with the probability p has occurred 105 
times. Ans. Using the preceding problem and applying Markoff's method, we find 
P * 0.846. 

8. An urn contains N white and black balls in unknown proportion. The number 
of white balls hypothetically may be 

0, 1, 2, ... AT 

and all these hypotheses are considered as equally likely. Altogether n balls are 
taken from the urn, m of which turned out to be white. Without returning these 
balls, a new group of ni balls is taken, and it is required to find the probability that 
among them there are mi white balls. Naturally, the total number of balls is so 
large as to have n -f- ni < W, Ans, The required probability has the same expression 

? _, 

ri 

I x**(l — x)^~^dx 

as in Prob. 4, page 69. 

Polynomials ordinarily called “Hermite's polynomials,” although they were dis¬ 
covered by Laplace, are defined by 


The first four of them are 


Hniy) 



2 


Hi(y) « -y; - y* - 1; /f,(y) = -y’ + 3y; 

They possess the remarkable property of orthogonality: 



^H„(y)H,(y)dv 


0 


when 




m ^ n 


y* “• 6 y* + 3L 


* ff.(»)«!/-VSiml. 


while 
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Under very general conditions, a function /(y) defined in the interval (— «, + ) 

can be represented by a series 


/(y) = Oo + €L\H\{y) €LtHt{y) -h 


where in general 


Let 


1 f- 


= - and 
n 


f(y)H,{y)dy. 


A* = 


a(l - ot) 

provided 0 < a < 1. 

9. Prove the validity of the following expansion indicated by Ch. Jordan: 


(^ + 1)! h 

— --X«(l - x)» -» = —r=< 

m!(n - m)! 


r I n+2 


AHi(y) + 


+ 2 

2n r- (lln + 6)a(l - a) 


h'Htiy) + 


...] 


2n(n + 2)(n + 3) 

for 0 g X ^ 1 where y is a new variable connected to x by the equation 

x = «+f. 

n 

Hint: Consider the development in a series of Hermite’s polynomials of the 
function 


/(y) = ^1 - „ - for -ha 

fiy) =0 if either y < —ha or y ; 


^ y ^ A(1 - a) 
y > A(1 — a). 


10. Assuming that the conditions of validity of formula (6) are fulfilled, show that 
the a posteriori probability of the inequalities 

^ s /®(^ “ ^ _ \ . a /«(1 - «) m 

-f-a /- < p < -h U\ -; a = — 

n y n n \ n n 

can be expanded into a convergent series 

<« 

*2n - (lln +6)a(l - <«) 


p- 2 r-L * 2 » 

V^Jo* ^ V^(» 


(n + 2)(n + 3)a(l - a) ^ ‘ ‘ 

When n is large and a is not near either to 0 nor to 1, two terms of this series suffice 
to give a good approximation to P (Ch. Jordan). Apply this to Prob. 7. 

Ans. 0.84585. 
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CHAPTER V 


USE OF DIFFERENCE EQUATIONS IN SOLVING PROBLEMS 
OF PROBABILITY 

1. The combined use of the theorems of total and compound proba¬ 
bility very often leads to an equation in finite differences which, together 
with the initial conditions supplied by a problem itself, serves to deter¬ 
mine an unknown probability. This method of attack is very powerful, 
and it is often resorted to, especially in the more difficult cases. In this 
chapter the use of equations in finite differences, applied to a few selected 
and comparatively easy examples, will be shown; but in Chap. VIII 
we shall apply the method to a class of interesting and historically 
important problems. 

Certain preliminary explanations are necessary at this point. Again 
we consider a series of trials resulting in an event E or its opposite, E, 
but this time we suppose that the trials are dependent, so that the 
probability of at a certain trial may vary according to the available 
information concerning the results of some of the other trials. 

A simple and interesting case of dependent trials arises if we suppose 
that the probability of E in the (n -f l)st trial receives a definite value 
aUE has happened in the preceding nth trial, and this value does not 
change whatever further information we may possess concerning the 
results of trials preceding the nth. Also, the probability of E in the 
(n -h l)st trial receives another determined value if E failed in 
the nth trial, no matter what happened in the trials preceding the nth. 

We have a simple illustration of this kind of dependence, if we suppose 
that drawings are made from an urn containing black and white balls in 
a known proportion, and that each ball drawn is returned to the urn, but 
only after the next drawing has been made. It is obvious that the proba¬ 
bility that the (n + l)st ball drawn will be white, becomes perfectly 
definite if we know what was the color of the ball immediately preceding, 
and it remains the same no matter what we know about the colors of the 
1, 2, ... (n — l)st balls. 

If the trials depend on each other in the above-defined manner, we 
say that they constitute a **simple chain,*’ to use the terminology of the 
late A. A. Markoff, who was the first to make a profound study of 
dependent trials of this and similar, but more complicated, types. It is 
implied in the definition of a simple chain that it breaks into two sepa¬ 
rate parts as soon as the result of a certain trial becomes known. For 

74 
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instance^ if the result of the fifth trial is known, trials 6 , 7, 8 , . . . become 
independent of trials 1, 2, 3, 4, and the chain breaks into two distinct 
parts: the trials preceding the fifth, and those following it. If the 
results of trials 1, 2, 3, ... (n — 1) remain unknown, the event E 
in the following nth trial has a certain probability which we shall denote 
by p„. Also, if it becomes known that E happened at trial fc, where 
^ < n — 1, the probability of E happening in the nth trial receives a 
different value, It is important to find means to determine the 

probability pn, the a priori probability of E in the nth trial when the 
results of the preceding trials remain unknown; as well as to determine 
the probability p^J’ of E in the nth trial when we possess the positive 
information that E has materialized in the A:th(A; < n — 1) trial. 

2. Thus we are led to the following problem concerning simple chains 
of dependent trials: 

Problem 1. The initial probability pi of the event in a simple 
chain of trials being known, find the probability pn of E in the nth trial 
when the results of the preceding trials remain completely unknown. 
Also, find the probability p^*^ of E in the nth.trial when it is known that 
E has happened in the kth trial where k < n — 1. 

Solution. In the nth trial the event E can happen either preceded 
by E in the (n ~ l)st trial, the probability of which is or preceded 
by F in the (n — l)st trial, the probability of which is 1 — pn-i. By 
the theorem of compound probability, the probability of the succession 
EE is pn-ia, while the probability of the succession FE is (1 — pn-i)/9. 
Hence, the total probability pn is 

(1) Pn = ap„_i + /3(1 - Pn-i) = (a ~ /3)Pn-l 4- 3. 

This is an ordinary equation in finite differences. It has a particular 
solution 


Pn = c = const. 

where c is determined by the equation 

c - (a - 0)c + jS, 


whence 


8 


provided 1 + 0 — a 5 ^ 0.^ On the other hand, the corresponding 

^If l-f/J — a«0 or a — we necessarily have a = 1 , /3 = 0 , which 

means that E must occur in all the trials if it actually occurs in the first trial, and 
never occurs if it does not actually occur at the outset. This case, as well as the other 
extreme case in which or — /9 » —1 can therefore be excluded as not possessing real 
interest. 
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homogeneous equation 

Vn = (a - 0)yn-\ 


has a general solution 

y. = C(a - 

involving an arbitrary constant C. Adding to it the previously found 
particular solution, we obtain the general solution of (1) in the form 


r 


p. - C(. - «- + 

he arbitrary constant C is determined by the initial condition 

/S 


C + 


!+/»-« 


= Pi 


so that finally 

P» = l+5-a + (p‘ - i 


If 


Pi = 


1 

iS 


1 + ^ - a 


we see that pn does not depend on n and is constantly equal to pi. Be¬ 
cause we may exclude the cases a — p = 1 or a — P = —1, so that 
a — is contained between — 1 and 1, we may conclude from the above 
expression that pn, if not a constant, at any rate tends to the limit 

1 - a 

as n increases indefinitely. 

As to p!f^ we find in a similar way that it satisfies the equation 
(2) p^> = ap^l, + i9(l - Pi*2i) 

of the same form as equation (1). But the initial condition in this 
case is pi^i = a because the probability of E happening in the (k + l)st 
trial is a when it is known that E occurred in the preceding trial. The 
solution of (2) satisfying this initial condition is 




1 


+ 


1 - a 

a 


(« ~ PY 


As the second term in the right-hand member decreases with increas¬ 
ing n and finally becomes less than any given number, we see that the 
positive information concerning the result of the fcth trial has less and less 
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influence on the probability of E in the following trials, and in remote 
trials this influence becomes quite insignificant. 


Example. An urn contains a white and h black balls, and a series of drawings of 
one ball at a time is made, the ball removed being returned to the urn immediately 
after the taking of the next following ball. What is the probability that the nth ball 
drawn is white when: (a) nothing is known about the preceding drawings; (Jb) the kih 
ball drawn is white? 

fl — 1 a a 

In this particular problem we have a = —= —— --» pi =-- 

a + b — I a+6-1 0+6 

and 


Thus 


1 + — a 


a 

0 + 6 


= Pi- 


= Pi = 


a 

0+6 


That is, the probability for any ball drawn to be white is the same as that for the 
first ball, nothing being known about the results of the previous drawings. The 
expression for is, in this example, 

p(*) — - -- 

o + 6 ^ ^ (o + 6)(a + 6 - 1)"-* 

So, for instance, if o « 1, 6 = 2, n = 5, A: = 3, 

3 3-2* 2* 


the information that the third ball was white raises to ^ the probability that the fifth 
ball will be white; it would be K without such information. 


3. The next problem chosen to illustrate the use of difference equa¬ 
tions is interesting in several respects. It was first propounded and 
solved by de Moivre, 

Problem 2. In a series of independent trials, an event E has the 
constant probability p. If, in this series, E occurs at least r times in 
succession, we say that there is a run of r successes. What is the proba¬ 
bility of having a run of r successes in n trials, where naturally n > r? 

Solution. Let us denote by !/„ the unknown probability of a run of 
r in n trials. In n + 1 trials the probability of a run of r will then be 
Vn+i- Now, a run of r in n + 1 trials can happen in two mutually 
exclusive ways: first, if there is a run of r in the first n trials, and second, 
if such a run can be obtained only in n + 1 trials. The probability ol 
the first hypothesis is To find the probability of the second hypothe¬ 
sis, we observe that it requires the simultaneous realization of the follow¬ 
ing conditions: 

(a) There is no run of r in the first n — r trials, the probability of 
which is 1 — j/n-r- (h) In the (n — r + l)st trial, E does not occur. 
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the probability of which is g = 1 — p. (c) Finally, E occurs in the 
remaining r trials, the probability of which is p**. 

As (a), (6), (c) are independent events, their simultaneous mate¬ 
rialization has the probability 

(1 - yn-r)gp^ 

At the same time, this is the probability of the second hypothesis. 
Adding it to pn, we must obtain the total probability Pn+i- Thus 

(3) Vn+l = Pn + (1 - yn-r)p^q 

and this is an ordinary linear difference equation of the order r + 1. 
Together with the obvious initial conditions 

2/0 = 2/1 = * * • = 2/r-i = 0, pr = 

it serves to determine p„ completely for n = r + 1, r + 2, . . . . For 
instance, taking n = r, we derive from (3) 

yr+i = p** + p^q. 

Again, taking n = r + 1, we obtain 

2/r+2 = p*" + 2p^q 

and so forth. Although, proceeding thus, step by step, we can find the 
required probability Pn for any given n, this method becomes very labori¬ 
ous for large n and does not supply us with information as to the behavior 
of p„ for large n. It is preferable, therefore, to apply known methods of 
solution to equation (3). First we can obtain a homogeneous equation 
by introducing 2 „ = 1 — p„ instead of p„. The resulting equation in 
Zn is 

(4) Zn+l - Zn + qP^Zn^r = 0 
and the corresponding initial conditions are: 

Zo = Zl = • * • = Zr-l =1; Zr = 1 - P^ 

We could use the method of particular solutions as in the preceding 
problem, but it is more convenient to use the method of generating 
functions. The power series in £ 

<p{X) = Zo + ZiJ + z%^^ + * * ‘ 

is the so-called generating function of the sequence Zo, Zi, 22 , • • • • 
If we succeed in finding its sum as a definite function of f, the development 
of this function into power series will have precisely Zn as the coefficient 
of To obtain ^(£) let us multiply both members of the preceding 
series by the polynomial 


1 - t + qp^P^K 
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The multiplication performed, we have 

(1 - { + + (2l - 2o)f + • • • + (Zr-l - + 

+ (Zr - Zr-l){’' + (Zr+1 ” Zf + ^P'Zo)?’*'^^ + * * * . 


In the right-hand member the terms involving {’■+*, . . . have 

vanishing coefficients by virtue of equation (4); also Zk — Zk--i = 0 for 
A; = 1, 2, 3, . . . r — 1, while 


so that 
and 


20 = 1 and Zr — Zr^i = —p*' 

(1 - «4- = 1 - 


^({) = 


1 - 
1 - f + 


The generating function <p{i) thus is a rational function and can be 
developed' into a power series of { according to the known rules. The 
coefficient of f" gives the general expression for z^. Without any dif¬ 
ficulty, we find the following expression for Zni 

(5) z„ = ^n.r — V^n-r.T 

where 

n 

= X(-1)'CU(9P')' 

f-0 

and 0n-r,r IS obtained by substituting n — r instead of n. If n is not very 
large compared with r, formula (5) can be used to compute Zn and 


2/n = 1 - Zn. 

For instance, if n = 20, r = 5, and p = g = 3^, we easily find 


Z 20 


64 64* 64* 



64 ^ 64 V 


and hence 


220 = 0.75013 


correct to five decimals; 2/20 = 0.24987 is the probability of a run of 5 
heads in 20 tossings of a coin. 

4, But if n is large in comparison with r, formula (5) would require 
so much labor that it is preferable to seek for an approximate expression 
for Zn which will be useful for large values of n. It often happens, and 
in many branches of mathematics, but especially so in the theory of 
probability, that exact solutions of problems in certain cases are not of 
any use. T'hat raises the question of how to supplant them by con- 
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venient approximate formulas that readily yield the required numbers. 
Therefore, it is an important problem to find approximate formulas where 
exact ones cease to work. Owing to the general importance of approxi¬ 
mations, it will not be out of order to enter into a somewhat long and 
complicated investigation to obtain a workable approximate solution 
of our problem in the interesting case of a large n. 

Since ^(f) is a rational function, the natural way to get an appropriate 
expression of Zn would be to resolve v>(f) into simple fractions, correspond¬ 
ing to various roots of the denominator, and expand those fractions in 
power series of {. However, to attain definite conclusions following this 
method, we must first seek information concerning roots of the equation 

1 - « + = 0 . 

6. Let 

/(«) = f - 1 - 

where 

a = - p). 

When p varies from 0 to 1, the maximum of p*‘(l — p) is attained for 
p == in all cases. 

To deal with the most interesting case, we shall assume 


( 6 ) 

which involves 


V < 


r 

r+1 


" ^ (r + 1)^+^ 

and we leave it to the reader to discover how the following discussion 

T 

should be modified if p ^ —r“T* 

^ r + 1 

When { starts to increase from 0, the function /({) steadily increases 
and attains a positive maximum for f = fo where 

(r + l)afo = 1 

after which /({) decreases steadily to negative infinity. Hence, there 
are two positive roots of the equation /({) = 0: Ji, which is less than 
r “1“ 1 

—-—) and another root greater than this number. This root is 1/p if 
condition (6) is fulfilled. 

The remaining roots are all imaginary if r is odd and there is one 
negative root among them if r is even. 

Now we shall prove that the absolute value of every imaginary or 
negative root is >l/p. Let p be the absolute value of any such root. 
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We have first 


/(p) = p - 1 - «p^^^ < 0 

so that p belongs either to the interval (0, {i) or to the interval (1 /p, -f <»), 
and if we can show that p > Jo then p can be only > 1/p. If the root we 
consider is negative, p satisfies the equation 

P(p) = 1 p _ apr+l = 0 

and since F{p) increases till a positive maximum for p = fo is reached, and 
then decreases, the root of F(p) = 0 is necessarily > fo- If f = pe'^ is 
an imaginary root of /({) = 0 we have, equating imaginary parts, 

(7) = 

^ sin 0 


But, whatever 0 may be 

Isin (r -f l)d| 


sin 6 


^ r + 1 


the equality sign being excluded if sin d 9 ^ 0 .^ Hence, 

(r + l)ap’- > 1 

which implies p > { 0 . The statement is thus completely proved. 
6. The equation 

f - 1 - = 0 

can be exhibited in the form 


! + «{' = 1. 

Substituting { = pc'® here, and again equating imaginary parts, we get 

ap"”^^ sin rd =* sin 9 

and, combining this with (7), 

— sin (r 4 - 1 )^ . _ (sin r$y sin 8 

^ sin rO * ^ [sin (r 1)0]*^^ 


^The extreme values of the ratio —:- (m integer >1) correspond to certain 

sin $ 

roots of the equation m sin S cos tn$ » sin m$ cos but for every root of this equation 

m 

-. 1 .—-i-:...--- ! ^ m 

V 1 -f- (m* — 1) sin* $ 

The equality sign is excluded if sin $ differs from 0. 


sin m$ 
sin $ 
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If the imaginary part of J is positive, the argument 0 is contained 


between 0 and ir. In this case, it cannot be less than 


r -f 1 


or greater 


than 1 


or 


' For, if 0 < ' 


r + 1 


r -h 1 
sin rB ^ sin (r + 1)0 
re ^ (r -f 1)0 


At the same time 


and hence 


(si 


sin rB 

^ »• . 

sin (r + 1)0 

r + 1 

sin 0 

> 1 

sin (r + 1 )« 

^r+ 1 

sin r8 

sin 6 


sin (r -f 1 ) 0 / sin (r + 1 )^ ^ (r + l)’''*'^’ 

which is impossible. That 0 cannot be greater than tt — :jr^r\ follows 

simply, because in this case, sin (r + 1)0 and sin rB would be of opposite 
signs and p would be negative. 


As —^ ^ e ^ 


r + 1 


r + 1 


we have 


p sin 0 > p sin 


r + 1 

On the other hand, sin x > 2x/w if 0 < x < 7r/2 and p > 1 /p. Hence, 

2 

p sin 0 > 7 — 

(r -f l)p 

Thus, imaginary parts of all complex roots have the same lower bound 

2 

(r + i)p 

of their absolute values. 

7. Denoting the roots of the equation /(J) = 0 by 
{*; (fc = 1 , 2 , . . . r + 1 ) 

“ 2(1 - p){*(r +1 - r{*)(^ “ 1 ) 


we have 
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Hence, expanding each term into power series of { and collecting 
coefficients of we find 


r + l 

For every imaginary root, we have 


(1 - ! 
(1 “ v)ik{r 4- 1 - r{*): 


< 


r + l 

r(l - p)^ 


n+2 


since 


Ur'l < r, 




< 2p; 


_ 1 _ 

Ir + 1 - r(k\ 


(r + l)p 
2r 


If r is oddj there are r -- 1 imaginary roots and the part in the expression 
of Zn due to them in absolute value is less than 


(r + l)(r ~ 1) 
r(l - p) ^ 


,n+2 


< 


1 -p'^ 


The term corresponding to the root 1/p vanishes, so that finally 

_Jl” . - r 


— 


+ 6-, 


(1 - p){i r + 1 - r{i ' "1 - p^ 
where |0| < 1 and {i denotes the least positive root of the equation 
1 - f + qp^i^+^ = 0. 


If r is even, there is one negative root. The part of corresponding 
to this root is less than 


(1 - p)r 

The whole contribution due to imaginary and negative roots is less than 


— r 


r(l - ^ ^ 1 - P 

in absolute value. Thus, no matter whether r is odd or even, we have 


(8) 2, = 1 — 


fr" 


(1 - p)?i r + 1 - r{i ^ . 1 - P 




This is the required expression for 2 „, excellently adapted to the case of a 
large value for n, since then the remainder term involving 6 is completely 
negligible in comparison with the first principal term. 



84 


INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. V 


The root {i can be found either by direct solution of the trinomial 
equation following Gauss' method, or by application of Lagrange's series. 
Applying Lagrange's series, we have 

Z-2 

log + ^ Br+l)(fr + 2)^-.-(lr + l-l) ^, 

Z-2 

both series being convergent if |a| < r'^/ir + 1)’’+^ and this condition is 
satisfied. 

8 . Let us apply the approximate formula (8) to the case p — ? = 
and r = 10. Using Lagrange's series, we find that 

= 1.0004909 

and 

Zn = 1.003947 • (1.0004909)-* + ~ 

Hence, for n = 100, 1,000, 10,000, respectively, 

2n = 0.9569; 0.6146; 0.0074 

so that, for instance, the probabilities of a run of at least 10 heads in 
100, 1,000, or 10,000 throws of a coin are, respectively, 

0.0441; 0.3854; 0.9926. 

Thus, in 10,000 throws, it is quite likely that heads would turn up 10 or 
more times in succession. 

In general, for a given r and increasing n, the probability tends to 1, 
so that in a very long series of trials, runs of any length are extremely 
likely to occur, a conclusion which at first sight seems paradoxical. 

9. In the preceding examples, an unknown probability was deter¬ 
mined by an ordinary equation in finite differences. Very often, how¬ 
ever, probability as a function of two or more independent variables is 
defined by a partial difference equation in two or more independent 
variables, together with a set of initial conditions suggested by the 
problem itself. A few examples will suffice to illustrate the use of 
partial equations in finite differences and to give an idea of the two 
principal methods for their solution; namely, Laplace's method of 
generating functions, and the less well known, but elegant, method 
proposed by Lagrange. 

We start with an analytical solution of the problem which was dis¬ 
cussed in detail in Chap. IIL 
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Problem 3. Find the probability of exactly x successes in i inde¬ 
pendent trials with the constant probability p. 

Solution by Laplace’s Method. Let us denote the required proba¬ 
bility by yx,f To obtain x successes in t trials can be possible only in 
two mutually exclusive ways: (a) by obtaining x successes in < — 1 trials 
and a failure at the last trial; (h) by obtaining success at the last trial 
and X — 1 successes in the preceding t — 1 trials. The probability of 
case (a) is qyx.t-i and that of case (6) is pyx~u-i- The total probability 
yx,t satisfies the equation 

(9) yx.t = vy^u-i + 

for all positive x and t. This equation alone does not determine i/,,* 
completely, but it does so in connection with certain initial conditions. 
These conditions are 

2/x.o = 0 if X > 0, 

( 10 ) 

yo./ ^ if f ^ 0. 

The first set of equations is obvious; the second set is the expression 
of the fact that if there are no successes in t trials, the failures occur t 
times in succession, and the probability for that is g*. 

Following Laplace, we introduce for a given t the generating function 
of 2 / 0 , 1 ; 2/i.t; 2 / 2 ,• • • ) that is, the power series 

00 

== i/o.< -f 2/u? + y2.if* -!-•••= ^yxA*. 

a-O 

Taking t — \ instead of i, separating the first term and multiplying by 
g, we have 

oo 

qfpt-\{i) = qyQ,t-i + ^qyx,t-ii*\ 

and similarly 

X — 1 

Adding and noting equation (9) we obtain 

(pi + = ^<(f) + qj/o,t-i — Po.o 

but because of (10) 

qyo.i^i - yo.t = q^ - q^ = 0 


and hence, 


MO = (p( + Q)<PtM0 



86 


INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap.V 


for every positive t Taking i = 1, 2, 3, . . . and performing successive 
substitutions, we get 

- (Pf + qy^i) 

and it remains only to find 

= l/o.o + 1 /i.of + 1/2,of* + * * * . 

But on account of (10), y,.o = 0 for a; > 0, while yo.o = 1. Thus, 


and 


v^(f) = 1 

V><(f) = (pf + qY- 


To find yx,t it remains to develop the right-hand member in a power series 
of f and to find the coefficient of f*. The binomial theorem readily gives 


yx,t - 


1-2 




10. Poisson’s Series of Trials. The analytical method thus enables 
us to find the same expression for probabilities in a Bernoullian series 
of trials as that obtained in Chap. Ill by elementary means. Considering 
how simple it is to arrive at this expression, it may appear that a new 
deduction of a known result is not a great gain. But one must bear in 
mind that a little modification of the problem may bring new difficulties 
which may be more easily overcome by the new method than by a general¬ 
ization of the old one. Poisson substituted for the Bernoullian series 
another series of independent trials with probability varying from 
trial to trial, so that in trials 1, 2 , 3 ,4, . . . the same event E has different 
probabilities pi, p 2 , Ps, P 4 , . . . and correspondingly, the opposite event 
has probabilities qi, ^ 2 , 9 s, 94 , . . . where 9 * = 1 — pkf in general. Now, 
for the Poisson series, the same question may be asked: what is the 
probability yx.t of obtaining x successes in t trials? The solution of this 
generalized problem is easier and more elegant if we make use of differ¬ 
ence equations. 

First, in the same manner as before, we can establish the equation in 
finite differences 

( 11 ) y,.i = piy.-1.1-1 + g«y».«-i. 

The corresponding set of initial conditions is 

yx.o = 0 if X > 0 

(12) yo.« = qiQi • • • g» if < > 0 

yo.o = 1 . 

Giving «>,({) the same meaning as above, we have 
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— qtyo,t-i + 

x-l 

ae 

Ptivt-iii) = 

X-l 

whence 

“1" = ^<({) + qtyo,t-\ — yo.<; 

but because of (12) 

Qtyo,t-i ““ yo.t = ‘ = 0, 

and thus 

^»(i) = (Pi( + 

whence again 

<Pt(() = (pif + gi){p2i + ^ 2 ) * * • (p<? 4- Qt)(Po(i)* 

However, by virtue of (12), v>o(i) = 1 so that finally 

^<(f) = (Pif 4“ Qi)(.P2i 4“ 5 ^ 2 ) • • • (p»{ 4 - 9i). 

To find the probability of x successes in t trials in Poisson^s case, one 
needs only to develop the product 

(Pif 4- qi)(p 2 ^ 4“ ^ 2 ) • • * (p*£ 4- qt) 

according to ascending powers of ( and to find the coefficient of {*. 

11. Solution by Lagrange’s Method. We shall now apply to equa¬ 
tion (9) the ingenious method devised by Lagrange, with a slight modifica¬ 
tion intended to bring into full light the fundamental idea underlying this 
method. Equation (9) possesses particular solutions of the form 

if a and /3 are connected by the equation 

a/3 = p 4- qa. 

Solving this equation for we find infinitely many particular solutions 

a*(q 4- pa~^Y 

where a is absolutely arbitrary. Multiplying this expression by an 
arbitrary function v’(a) and integrating between arbitrary limits, we 
obtain other solutions of equation (9). Now the question arises of how 
to choose <p{a) and the path of integration to satisfy not only equation (9) 
but also initial conditions (10). We shall assume that <p{a) is a regular 
function of a complex variable a in a ring between two concentric circles, 
with their center at the origin, and that it can therefore be represented in 
this ring by Laurent^s series 

#»(“) = ^ 


Cntt". 
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If c is a circle concentric with the regularity ring of and situated 
inside it, the integral 

yz,i = + po£"0V(a)da 

is perfectly determined and represents a solution of (9). To satisfy 
the initial conditions, we have first the set of equations 



which show that all the coefficients Cn with negative subscripts vanish, 
and that ^(a) is regular about the origin. The second set of equations 
obtained by setting x = 0 

+ pa-‘)'^da = ^ for / = 0, 1, 2, . . . 

serves to determine ^(a). If € is a sufficiently small complex parameter, 
this set of equations is entirely equivalent to a single equation: 

1 r <p{a)da _ 1 

2x1 Je a — €(p -f qa) 1 — €q 

Now the integrand within the circle c has a single pole m determined by 
the equation 

oo = €(p + gao) 

and the corresponding residue is 

1 - qt 

At the same time, this is the value of the left-hand member of the above 
equation, so that 

^(«o) _ 1 

1 — ge 1 — gc 
or 

^(oo) = 1 

for all sufficiently small € or oo. That is, ^(a) = 1 and 

is the required solution. It remains to find the residue of the integrand; 
that is, the coefficient of 1 /a in the development of 
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in series of ascending powers of a. That can be easily done, using the 
binomial development, and we obtain 

y.,t = CfpY- 

as it should be. 

12. Problem 4. Two players, A and B, agree to play a series of 
games on the condition that A wins the series if he succeeds in winning a 
games before B wins h games. The probability of winning a single game 
is p for A and g = 1 — p for so that each game must be won by either 
A or B. What is the probability that A will win the series? 

Solution. This historically important problem was proposed as an 
exercise (Prob. 12, page 58) with a brief indication of its solution based 
on elementary principles. To solve it analytically, let us denote by 
y*.* the probability that A will win when x games remain for him to win, 
while his adversary B has t games left to win. Considering the result 
of the game immediately following, we distinguish two alternatives: 
(o) A wins the next game (probability p) and has to win x — 1 games 
before B v^ins t games (probability ^ loses the next game 

(probability q) and has to win x games before B can win t ^ 1 games 
(probability The probabilities of these two alternatives being 

py»-i,t and qyx.t-i their sum is the total probability y,.*. Thus, y,,i 
satisfies the equation 

(13) y*,( = pyx-\,i + qyx,t-v 

Now, y*,o = 0 for X > 0, which means that A cannot win, B having 
won all his games. Also, yo,* = 1 for < > 0, which means that A surely 
wins when he has no more games to win. The initial conditions in our 
problem are, therefore, 

y..o = 0 if X > 0; 

(14) 

yo.< = 1 if t > 0. 

The symbol yo.o has no meaning as a probability, and remains undefined. 
For the sake of simplicity we shall assume, however, that yo.o = 0. 

Application of Laplace’s Method. Again, let 

9 «(f) = p«.o + y*.i? + y*. 2 f* + • • • 
be the generating function of the sequence y,.o; y*,i; y»,j, . . . cor¬ 
responding to an arbitrary x > 0. We have 

= X9yx.«-i£' 

40 

P4>.-i(£) = + %pyi-u^ 

(-1 
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and 


== pi/x-i.o 4* 5 )(p2/x-i.« + qyz,t-di^ 


or, because of (13), 

q(^x(() 4" p<fix-i(() = P2/x-i.o “ 2/x.o 4- ^x(£). 

Now, for every a; > 0 

2/x.o = 2/»-i.o = 0 

in conformity with the first set of initial conditions, which allows us to 
present the preceding relation as follows: 


^x({) — j 

whence 

But 


fpo(£) == 2/0.0 4“ 2/o.i£ 4* 2/o.2f* 4-’** = f4-£*4-£*4-* ‘ = 


and finally 

° (1 - {)Tl - 90*' 

It remains to develop the right-hand member in a power series of £ and 
find the coefficient of £*. As 


and 


j4f = i + + (’+■•■ 


_J_ = 1 + + 

(1 - 90* 1 -t-+ 1.25'^ 


we readily get, multiplying these series according to the ordinary rules, 




+ 


x(x 4-1) 


1-2 


(x + t-2 ) 

(l-l) ® 


which coincides with the elementary solution indicated on page 58. 

Application of Lagrange’s Meffiod. Equation (13) has particular 
solutions of the form 


where 


a/3 = p/3 4“ qa. 
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Hence, we can either express a by iS or by a. Leaving it to the reader 
to follow the second alternative, we shall express^ a as a function of 
and seek the required solution in the form 


where ip(0) is again supposed to be developable in Laurent^s series in a 
certain ring; c is a circle described about the origin and entirely within 
that ring. Setting a; = 0, we must have 




= 1 for f = 1, 2, 3, . . 


and this set of equations is satisfied if we take 

vW ^ ^ + ■ ■ ■ = i|3((3 _ 1) > 1^1 > 


Now we have 


and for t = 0 


_ p* r 

„ ^ PL C _^ 

2 inMl - 


- 1 ) 


1 ) 


= 0 


as it should be, because for |/3| > 1 the integrand can be developed into a 
power series of 1//3, the term with 1//3 being absent. Thus, the required 
solution is given by 


yz.t 


p* r 

2irij (1 - - 1 ) 


where c is a circle of radius >1 described about the origin. The final 
expression for p*,* is obtained as the coefficient of 1//3 in the development 


(1 - - 1 ) 

into power series of 1//3. We obtain the same expression as before. 


Problems for Solution 

1. Each of n urns contains a white and b black balls. One ball is transferred from 
the first urn into the second, another one from the second into the third, and so on. 
Finally, a ball is drawn from the nth urn. What is the probability that it is white, 
when it is known that the first ball transferred was white? 

Ans. —^-7 H-^(a + b + 

0+0 0+6 



92 


INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. V 


2. Two urns contain, respectively, a white and 6 black, and h white and a black 
balls. A series of drawings is made, according to the following rules: 

а. Each time only one ball is drawn and immediately returned to the same um it 
came from. 

б. If the ball drawn is white, the next drawing is made from the first urn. 

c. If it is black, the next drawing is made from the second urn. 

d. The first ball drawn comes from the first urn. 

What is the probability that the nth ball drawn will be white? 


Ana. pn 



3. Find the probability of a run of 5 in a series of 15 trials with constant prob¬ 
ability p = Ana. 2 /i» - 23.3-« - 70.3-»* = 0.0314184. 

4. How many throws of a coin suffice to give a probability of more than 0.999 for 


a run of At least 100 heads? Ana. 1.76 • 10” throws suffice. 

6. What is the least number of trials assuring a probability of ^ ^ for a run of at 
least 10 successes if p =* g =» H? Ana. 1,420. 

6. Seven urns contain black and white balls in the following proportions: 


Urns. 

1 



B 

B 


H 

White. 

1 




B 


B 

Black. 

2 

1 

2 

1 

B 


m 


One ball is drawn from each urn. What is the probability that there will be among 
them exactly 3 white balls? Ana. Coefficient of in. 

+ mu + mu -'r mu + mu + f)(«^ + mu + u 

or 

if J = 0.28025. 

7. Two players, each possessing $2, agree to play a series of games. The prob¬ 
ability of winning a single game is ^ for both, and the loser pays $1 to his adversary 
after each game. Find the probability for each one of them to be ruined at or before 
the nth game? 

Solution. Let pm be the probability that after playing 2m games, neither of the 
players is ruined. We have 

J/«+i 

and hence 

Vm 


= HVm 

- i. 

■“ 2 ™ 


The probability for one of the players to be ruined at or before the nth game *8^ — 


if n = 2m or n — 2m -f 1. 

8. Solve the same problem if each player enters the game with $3. 

Ana. H if n = 2m — 1 or n = 2m. 

0. Players A 1 , As, . . . An^-i play a series of games in the following order: first Ai 
plays with As; the loser is out and the winner plays with the following player. As; the 
loser is out again and the next game is played with As, and so on; the loser always being 
out and his place taken by the next following player. The probability of winning a 
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single game is for each player and the series is won by the player who succeeds in 
winning over all his adversaries in succession. What is the probability that the 
series will stop exactly at the xth game? What is the probability that the series will 
stop before or at the xth game? 

Solution. Let be the probability that the series terminates exactly at the xth 
game. That means that the player who won the game entered at the (x — n + l)st 
game and won successively the n following games. Now, there are n — 1 cases 
to be distinguished according as the player beaten at the (x — n + l)st game has 
already won 1, 2, 3, . . . n — 1 games. Let be the probability that the loser in the 
(x — n -f l)8t game previously has won k games. The probability of ending the 
series in this case is On the other hand, 


so that 


Hence, for x > n 


Initial conditions: 


Pk 

2 »-* 


= y—k 


2 " 2 * ' 


1 

I/* = 


1 

+ ~yx-i + 


1 


= yj = • • * Vn-i * 0; yn = —• 


The generating function of y»: 


yi + ya^ + yi^* 4- 



and the generating function of the probability that the series will end before or at the 
xth game is 



10. Three players, y4, B, C, play a series of games, each game being won by one of 
them. If the probabilities for A, B, C to win a single game are p, q, r, find the prob¬ 
ability of A winning a games before B and C win b and c games, respectively. 

Solution. Let Ax.y.g denote the probability for A to win the scries when he has 
still to win X games, while B and C have to win y and z games, respectively. First, 
we can establish the equation 


Ax,y,t =* pAx~-\,y,M 4" ^Ax,y—\,» 4“ f Ax,y,M—\. 

Next, Aii,y,, = 1 for positive y, z, and Ax.q,* = 0 for positive x, z; A,,v,o = 0 for posi¬ 
tive X, y. Besides, although this is only a formal simplification, we shall assume 
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Aa,o,» ^ Oy A»,y,o » 0 when x or y or z vanishes. For the generating function of 

A»,y,t 


we find the equation 


«.({, i») = 5) 

y ,*-0 


whence 


The final answer is 


0*(^f v) 


V 


1 - - ri7 

p* 


n) 


(1 - gf - rn )* (1 - €)(1 - ,) 


a - aia 4- 1)- aCa + l)(a 4* 2)- 1 

Aa,t.c = p“[l + J(g + r) + \g + r)* + 3 —- (g + r)» + • • - I. 


the dash indicating that powers of g and r with the exponents and are omitted. 

Obviously, the same method can be extended to any number of players, and leads 
to a perfectly analogous expression of probability. 

11. An urn contains n balls altogether, and among them a white balls. In a series 
of drawings, each time one ball is drawn, whatever its color may be, it is replaced by 
a white ball. Find the probability yx,r that after r drawings there are x white balls 
in the urn. 

Solution. The required probability satisfies the equation 


n- X + l X 

y*.r+i =-y*-i.f + 

n n 

Besides, 

1/a.o = 1, y».0 =0 if X 5-s o, y,,r =0 if X < a. 

From the preceding equation, combined with the initial conditions, we find sue* 
cessively 



J/a+l 


ya+*.r = 




{n — a){n — a — 1)| 
1 • 2 


and so on. 


12. If, in the problem of runs, p is supposed to be > ^ prove that the probabil¬ 


ity of a run of r in n trials is greater than 




p - Pi 


r(p + Pi) 


where pi < 


r + l 


(r + l)pi 


is a root of the equation 


\j^ 

/I - Pi 


pI(i “ Pi) = p^’d - p). 
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IS. To find an asymptotic expression of probability for a run of r in n independent 

f 

trials, if p ^ - 71 the following proposition is of importance: Imaginary and negaa 

r -h 1 

tive roots of the equation 


(1 — a)x* —x+«=0; 0<«^ -- 

n — 1 

are, in absolute value, greater than the root 72 > 1 of the equation 

2x 

(1 — a)R* — R + 8 cos — « 0. 

n 

Prove the truth of this statement. 

14. Given s urns containing the same number n of black and white balls in known 
proportions, drawings are made in the following manner: first, a single ball is drawn 
out of every urn; second, the ball drawn from the first urn is placed into the second; 
that drawn from the second is placed in the third, and so on; finally, the ball drawn 
from the last urn is placed in the first, so that again every urn contains n balls. Sup¬ 
posing that this operation is repeated t times, find the probability of drawing a white 
ball from the a:th urn. 

Solution. T.iet j/x.t be the required probability. -First, it can be shown that it 
satisfies the equation 

l\ 1 

1 - iVx.t-i + 

n/ n 

The initial probabilities yi.o, y^.o, . . • y«.o are known; and, moreover, the function 
y*,t must satisfy a boundary condition of the periodic type, yo.i = y,,|. Hence, 
applying Lagrange’s method, the following solution is found 

^ ~ 0 ^ l + 1 - 2) + . . . ] 

where 

fix) = Vx.Q when x > 0 
and the definition is extended to x ^ 0 by setting 

/(-x) =/(s - x). 

If, to begin with, all urns contain the same number of white and black balls, so that 
fix) = const. = p, we shall have, no matter what t is. 
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CHAPTER VI 


BERNOULLI’S THEOREM 

1. This chapter will be devoted to one of the most important and 
beautiful theorems in the theory of probability, discovered by Jacob 
Bernoulli and published with a proof remarkably rigorous (save for some 
irrelevant limitations assumed in the proof) in his admirable posthumous 
book *‘Ars conjectandi^’ (1713). This book is the first attempt at scien¬ 
tific exposition of the theory of probability as a separate branch of 
mathematical science. 

If, in n trials, an event E occurs m times, the number m is called the 
“frequency’’ of in n trials, and the ratio m/n receives the name of 
“relative frequency.” Bernoulli’s theorem reveals an important proba¬ 
bility relation between the relative frequency of E and its probability p. 

Bernoulli’s Theorem. With the 'probability approaching 1 or certainty 
08 near as we please, we may expect that the relative frequency of an event E 
in a series of independent trials with constant probability p will differ from 
that probability by less than any given number c > 0, provided the number 
of trials is taken sufficiently large. 

In other words, given two positive numbers e and ly, the probability 
P of the inequality 



will be greater than 1 — if the number of trials is above a certain 
limit depending upon c and 17 . 

Proof, Several proofs of this important theorem are known which 
are shorter and simpler but less natural than Bernoulli’s original proof. 
It is his remarkable proof that we shall reproduce here in modernized 
form. 

a. Denoting by Tm, as usual, the probability of m successes in n trials, 
we shall show first that 

(U ^ 

^ ^ n Ta 

if 6 > a and A; > 0 . Since the ratio 

_n - xp 
T, x + lq 
9fi 
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Tb+l ^ Tg+l 
n Ta 


or 


Tb+i ^ ^ 
Ta+l ^ Ta’ 


Changing 6, a, respectively, into 6-f-l,a4-l;6 + 2, a + 2; • • • 
a kj it follows from the last inequality that 


Tb+k ^ Tb+k-i ^ ^ Tb+i ^ Tb 

a+k -t a-f*—1 J- o-f 1 i a 


-» o-h* ^ o-h* 

~t7 ~t7’ 

h. Integers X and m being determined by the inequalities 
X — l<np^X, n— l<np-\-Tu^n 
the probabilities A and C of the inequalities 


_ . m . m . 

0^- V < e: — — p^€ 

n n 


are represented, respectively, by the sums 

A = T’x + T\+i + • • * + T^^i 

C = A- T^+i ‘ ‘ Tn 

the first of which contains \ — g terms. Combining terms of the 
second sum into groups of g terms (the last group may consist of less than 
g terms) and setting for brevity 

Ai — Tfi A' Tti^i + * * • + T,^^g-i 
Al — Tft^g 4" Tfl+g^l “h * ' ' “h T^ + 20—1 
As = Th+2o + T^+20+I 4" ' * * + 


we shall have 


C = Ai 4" A 2 4" As 4 - • • • 


and at the same time 


( 2 ) 

The ratio 



AI _ T\^g 4~ Tx+tf-n 4- • • • 4- Tx4-2g- » 
A Tx 4“ T\+\ 4* * * * 4“ Tx+p-i 


is less than the greatest of numbers 

T\-^-g Tx^g^l T\~\.2g — 1 
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But by inequality (1) 


^ ... ^X-f-2(;—1 

Tx ^ ^ ^ Tx +,^1 

hence 

A ^ Tx‘ 

Similarly, 

A% ^ A^ ^ ^M+2g . . 

A,^ T/ A,^ T,J 


and again by inequality (1) 



Consequently 







and inequalities (2) are established, 
c. For x'^\ 



It suffices to show that 


Tx+i _ n - \ p ^ ^ 

Tx “ X + 1 g ^ 

As X ^ np 

n-\ p ^ npq - 
X + 1 g npq + q 

which shows that < 1. 

ix 

The inequality just established shows that in the following expression: 


T ' /p /p fp tp rp 

M M . * M—1 ■* #»—O-Hl * M—O •* X+i 

Tk ” 7^ ’ ' ‘ IVT" ’ TVTTi ■ * ' "tT 


all the factors.are <1. Consequently, if we retain a ^ g first factors 
only, replacing the others by 1, we get 


7m < T^M . TV—1 ^ ^ T,i-a-t-i 

Tx ”” Tm-1 T^M-a Tp-o 


T’m 

Tm-i 


< 


Tizi 

T'm-? 


< 


Tn-a^\ 

^ Tu^ 


Moreover, 
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whence the following important inequality results: 


(3) 


Tit ^ / n — /X -f Q t pV 
T\ \}L — a \ q) 


Here a is an arbitrary positive integer ^ g. 
Now, let c be an arbitrary positive number, 
for 


(4) 

we have both 


n ^ 


Qf(l -f €) - g 

€(p -f 0 


Then we can show that 


... n — Ax + ap. p , .... , 

(i) -r - ^ — and (ii) a ^ g. 

'^^/x-a-rl9 P + € ^ 

Since /x ^ np + nc, it suffices to show that (i) is satisfied for /x = np + n«. 
If /X == np + n€ inequality (i) is equivalent to 

nq — n€ -{■ a ^ q 
np -hwt — of + l ““p-l-c 

or, after obvious simplifications, 

^(P + 0 = «(1 + «) — O'- 

But this inequality follows from (4). To establish (ii), since a and g 
are integers, it suffices to show that a < gf + 1. But m ^ + n€, 

X < np + 1 and consequently g + 1 > m. Hence (ii) will be estab¬ 
lished if we can show that nc ^ a which by virtue of (4) will be true if 

P + € 

that is, if 

a(l -f* «) — q ^ ap a€ 

or aq — q ^0 which is obviously true, a being a positive integer. 

d. The auxiliary integer a is still at our disposal. Given an arbitrary 
positive number t; < 1 we shall determine a as the least integer satisfying 
the inequality 



A.t the same time 
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and since log ( 1 + -) > ——» we shall have 
\ V/ P + « 


and 


a < 1 + ^ ^ - log - 
€ V 


e(p + e) t- n f 


Consequently, if 

(5) 

then by virtue of (i) and (3) 


^ 1 + 6 , 1,1 

^ -2— log - + - 

7} e 




and by virtue of (2) 

AI < Arij Ai < Alt! < Arj^y As < A 2^1 < Arj^f 

whence 


( 6 ) 


C <C Arj Ar}^ Arj^ -h 


1 - 


This inequality holds if n satisfies (5). No trace of the auxiliary 
integer a is left. 

e. Let us now consider the inequalities 


-€ <-p < 0 

n ^ 


, m 
and — 
n 


< — 


and introduce their respective probabilities B and D. These inequalities 
are equivalent to 


r. — m ^ 

0 <-<7 < € 


and 


n — m 
n 


q ^ 


It is apparent that we can interpret B or D as probabilities that the num¬ 
ber of occurrences m! = n — m of the event F opposite to in n trials will 

7nf tn/ 

satisfy either the inequality 0 < —- — < e or — — ^ Since 

the right-hand side of (5) contains only given numbers c, rj it is clear that 

Bv 


(7) 


D<.- 

l - 71 


if (5) is satisfied. 

Now i4 4* B = P is the probability of the inequality 


- p < e 



Sbc. 2] 


BERNOULLPS THEOREM 


101 


and C + D = Qis the probability of the opposite inequality 


n - PI 




Hence P + Q = 1. Moreover, by ( 6 ) and (7) 

Pri 


Q < 


1 


Consequently, 

or 

if only 


P + r^ > 1 
1 - ^ 

P > 1 - V 


^ 1 -f- 1,1 

n ^ log - -f — 


This completes the proof of Bernoulli’s theorem. 

For example, if p = ^ = }^ and c = 0 . 01 , — 0.001 we get from (5) 


n ^ 69,869 

which shows that in 69,869 trials or more there are at least 999 chances 
against 1 that the relative fn^queney will differ from by less than Koo* 
The number 69,869 found as a lower limit of the number of trials is 
much too large. A much smaller number of trials would sufl&ce to fulfill 
all the requirements. From a practical standpoint, it is important to 
find as low a limit as possible for the necessary number of trials (given e 
and r]). With this problem we shall deal in the next chapter. 

2. Bernoulli’s theorem states that for arbitrarily given e and rj there 
exists a number no(e, v) such that for any single value n > no(c, 17 ) the 
probability of the inequality 


will be greater than 1 — r;. The question naturally arises, whether for 
given c and 17 a number N(€y rf) depending upon c and rj can be found such 
that the probability of simultaneous inequalities 


for all n > N{e, 17 ) will still be greater than I — rj. The following theo¬ 
rem due to Cantelli shows that this question can be answered positively. 

Cantelli’s Theorem. For given €< ly rj < 1 let N he an integer 
satisfying the inequality 


2 


4 
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The probability that the relative frequencies of an event E vriU differ from 
p by less than t in the Nth and all the following trials is greater than 1 — i?. 
Proof. We shall prove first that the probability Q, of the inequality 


m 


- P 


will always be less than According to results proved in the 

preceding section for any 17 > 0 

Qn < 1 ? 
if 

^ 1 + t, 1.1 

n > —— log - + 

This inequality, if we take q = 2 c-W»** becomes 


... 1 + * ,1 
« > -^n + - 


i4-‘log2 


and in this form it is evident, since for c < 1 


1 - log 2 < 1 - 2 log 2 < 0 . 

Hence, as stated, 


( 8 ) Qn < 

The event A, in which we are interested, consists in simultaneous 
fulfillment of all the inequalities 


m 

n 


p 


< € 


for n = iV, AT + 1, iV” -f 2 , . . . . The opposite event B consists in 
the fulfillment of at least one of the inequalities 


where n can coincide either with or with N + 1 , or with -f 2 , . . . * 
The probability of 5, which we shall denote by Ry certainly does not 
exceed the sum of the probabilities of all the inequalities 


m 

n 


P ^ € 


forn = AT, AT + 1, AT + 2, . . . . 
Consequently, referring to ( 8 ), 


R < 2^,6-^"** = 

n-iV 


1 - 
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To satisfy the inequality 


1 - e-*** 


< 1 ? 


it suffices to take 


Now 


Consequently, if 


^ 2 , 2 . 2 , 1 
Ar>-iog- + -jog^-^.. 




AT ^ ? log 4- + 2 

6^77 


we shall have R < v and at the same time the probability of A will be 
greater than 1 — »?, which proves Cantelli’s theorem. 


Significance of Bernoulli’s Theorem 

3. As was indicated in the Introduction, one of the most important 
problems in the theory of probability consists in the discovery of cases 
where the probability is very near to 0 or, on the contrary, very near to 1, 
because cases with very small or very great” probability may have real 
practical interest. In Bernoulli’s theorem we have a case of this kind; 
the theorem shows that with the probability approaching as near to 1 
or certainty as we please, we may expect that in a sufficiently long 
series of independent trials with constant probability, the relative fre¬ 
quency of an event will differ from that probability by less than any 
specified number, no matter how small. But it lies in the nature of the 
idea of mathematical probability, that when it is near 1, or, on the con¬ 
trary, very small, we may consider an event with such probability as 
practically certain in the first case, and almost impossible in the second. 
The reason is purely empirical. 

To illustrate what we mean, let us consider an indefinite series of 
independent trials, in which the probability of a certain event remains 
constantly equal to It can be shown that if the number of trials 
is, for instance, 40,000 or more, we may expect with a probability > 0.999 
that the relative frequency of the event will differ from }4 by less than 
0.01. In other words, we are entitled to bet at least 999 against 1 that 
the actual number of occurrences will lie between the limits 0.49/1 and 
0.51n if n ^ 40,000. If we could make a positive statement of this 
kind without any mention of probability, we should be offering an ideal 
scientific prediction. However, our knowledge in this case is incomplete 
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and all we are entitled to state is this: we are more sure to be right in 
predicting the above limits for the number of occurrences than in expect¬ 
ing to draw a white ball from an um containing 999 white and only 1 
black ball. 

In practical matters, where our actions almost never can be directed 
with perfect confidence, even incomplete knowledge may be taken as a 
sure guide. Whoever has tried to win on a single ticket out of 10,000 
knows from experience that it is virtually impossible. Now the convic¬ 
tion of impossibility would be still greater if one tried to win on a single 
ticket out of 1,000,000. 

In the light of such examples, we understand what value may be 
attached to statements derived from Bernoulli's theorem: Although the 
fact we expect is not bound to happen, the probability of its happening 
is so great that it may really be considered as certain. Once in a great 
while facts may happen contrary to our expectations, but such rare excep¬ 
tions cannot outweigh the advantages in everyday life of following the 
indications of Bernoulli's theorem. And herein lies its immense practical 
value and the justification of a science like the theory of probability. 

It should, however, be borne in mind that little, if any, value can be 
attached to practical applications of Bernoulli's theorem, unless the 
conditions presupposed in this theorem are at least approximately ful¬ 
filled: independence of trials and constant probability of an event for 
every trial. And in questions of application it is not easy to be sure 
whether one is entitled to make use of Bernoulli's theorem; consequently, 
it is too often used illegitimately. 

It is easy to understand how essential it is to discover propositions 
of the same character under more general conditions, paying especial 
attention to the possible dependence of trials. There have been valuable 
achievements in this direction. In the proper place, we shall discuss the 
more important generalizations of Bernoulli's theorem. 

4. When the probability of an event in a single experiment is known, 
Bernoulli's theorem may serve as a guide to indicate approximately how 
often this event can be expected to occur if the same experiments are 
repeated a considerable number of times under nearly the same condi¬ 
tions. When, on the contrary, the probability of an event is unknown 
and the number of experiments is very large, the relative frequency of 
that event may be taken as an approximate value of its probability. 
Bernoulli himself, in establishing his theorem, had in mind the approxi¬ 
mate evaluation of unknown probabilities from repeated experiments. 
That is evident from his explanations preceding the statement of the 
theorem itself and its proof. Inasmuch as these explanations are interest¬ 
ing in themselves, and present the original thoughts of the great discov¬ 
erer, we deem it advisable here to give a free translation from Bernoulli's 
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book. After calling attention to the fact that only in a few cases can 
probabilities be found a priori, Bernoulli proceeds as follows: 

So, for example, the number of cases for dice is known. Kvidently there are 
as many cases for each die as there are faces, and all these cases have an equal 
chance to materialize. For, by virtue of the similitude of faces and the uniform 
distribution of weight in a die, there is no reason why one face should show' up 
more readily than another, as there would be if the faces had a different shape 
or if one part of a die were made of heavier material than another. So one knows 
the number of cases when a white or a black ticket can be drawn from an urn, 
and besides, it is known that all these cases are equally possible, because the num¬ 
bers of tickets of both kinds are determined and known, and there is no apparent 
reason why one of these tickets could be drawn more readily than^ any other. 
But, I ask you, who among mortals will ever be able to define as so many cases, 
the number, e.g., of the diseases w'hich invade innumerable parts of the human 
body at any age and can cause our death? And who can say how much more 
easily one disease than another—plague ttian dropsy, dropsy than fever— can 
kill a man, to enable us to make conjectures about the future state of life or 
death? Who, again, can register the innumerable cases of changes to which the 
air is subject daily, to derive therefrom conjectures as to what will be its state 
after a month or even after a year? Again, wdio has sufficient knowledge of the 
nature of the human mind or of the admirable structure of our body to be able, 
in games depending on acuteness of mind or agility of body, to enumerate cases 
in which one or another of the participants will win? Since such and similar 
things depend upon completely hidden causes, which, besides, by reason of the 
innumerable variety of combinations will forever escape our efforts to detect 
them, it would plainly be an insane attempt to get any knowledge in this fashion. 

However, there is another way to obtain what we want. And what is impossi¬ 
ble to get a priori, at least can be found a posteriori; that is, by registering the 
results of observations performed a great many times. Because it must be pre¬ 
sumed that something may occur or not occur as many times as it had previously 
been observed to occur or not occur under similar conditions. For instance, if, 
in the past, 300 men of the same age and physical build as Titus is now, were 
investigated, and it were found that 200 of them had died within a decade, the 
others continuing to enjoy life past this term, one could pretty safely conclude 
that there are twice as many cases for Titus to pay his debt to nature within the 
next decade than to survive beyond this term. So it is, if somebody for many 
preceding years had observed the weather and noticed how many times it W'as 
fair or rainy; or if somebody attended games played by two persons a great many 
times and noticed how often one or the other won; by these very observations he 
would be able to discover the ratio of cases which in the future might favor the 
occurrence or failure of the same event under similar circumstances. 

And this empirical way of determining the number of cases by experiments is 
neither new nor unusual. For the author of the book “Ars cogitandi,” a man 
of great acumen and ingenuity, in Chap. 12 recommends a similar procedure, 
and everybody does the same in daily practice. Moreover, it cannot be con¬ 
cealed that for reasoning in this fashion about some event, it is not sufficient to 
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make a few experiments, but a great quantity of experiments is required; because 
“even the most stupid ones by some natural instinct and without any previous 
instruction (which is rather remarkable) know that the more experiments are 
made, the less is the danger to miss the scope. 

Although this is naturally known to anyone, the proof based on scientific 
principles is by no means trivial, and it is our duty now to explain it. However, 
I would consider it a small achievement if I could only prove what everybody 
knows anyway. There remains something else to be considered, which perhaps 
nobody has even thought of. Namely, it remains to inquire, whether by thus 
augmenting the number of experiments the probability of getting a genuine ratio 
between numbers of cases, in which some event may occur or fail, also augments 
itself in such a manner as finally to surpass any given degree of certitude; or 
whether the problem, so to speak, has its own asymptote; that is, there exists a 
degree of certitude which never can be surpassed no matter how the observations 
are multiplied; for instance, that it never is possible to have a probability greater 
than or that the real ratio has been attained. To illustrate this by an 

example, suppose that, without your knowledge, 3,000 white stones and 2,000 
black stones are concealed in a certain urn, and you try to discover their numbers 
by drawing one stone after another (each time putting back the stone drawn 
before taking the next one, in order not to change the number of stones in the 
urn) and notice how often a white or a black stone appears. The question is, 
can you make so many drawings as to make it 10, or 100, or 1,000, etc., times 
more probable (that is, morally certain) that the ratio of frequencies of white and 
black stones will be 3 to 2, as is the case with the number of stones in the urn, 
than any other ratio different from that? If this were not true, I confess nothing 
would be left of our attempt to explore the number of cases by experiments. 
But if this can be attained and moral certitude can finally be acquired (how that 
can be done I shall show in the next chapter), we shall have cases enumerated a 
posteriori with almost the same confidence as if they were known a priori. And 
that, for practical purposes, where ‘‘morally certain” is taken for “absolutely 
certain” by Axiom 9, Chap. II, is abundantly sufficient to direct our conjectures 
in any contingent matter not less scientifically than in games of chance. 

For if instead of an urn we take the air or the human body, that contain in 
themselves sources of various changes or diseases as the urn contains stones, we 
shall be able in the same manner to determine by observations how much more 
likely one event is to happen than another in these subjects. 

To avoid misunderstanding, one must bear in mind that the ratio of cases 
which we want to determine by experiments should not be taken in the sense of a 
precise and indivisible ratio (for then just the contrary would happen, and the 
probability of attaining a true ratio would diminish with the increasing number of 
observations) but as an approximate one; that is, within two limits, which, 
however, can be taken as near as we wish to each other. For instance, if, in the 
case of the stones, we take pairs of ratios and or and 

etc., it can be shown that it will be more probable than any degree of 
probability that the ratio found in experiments will fall within these limits than 
outside of them. Such, therefore, is the problem which we have decided to 
publish here, now that we have struggled with it for about twenty years. The 
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novelty of this problem as well as its great utility, combined with equal difficulty, 
may add to the weight and value of other parts of this doctrine.—“ Ars Conjee- 
tandi,’^ pars quarta. Cap. IV, pp. 224-227. 

Application to Games op Chance 

6. One of the cases in which the conditions for application of Ber¬ 
noulli’s theorem are fulfilled is that of games of chance. It is not out 
of place to discuss the question of the commercial values of games from 
the standpoint of Bernoulli’s theorem. ^^Game of chance” is the term 
we apply to any enterprise which may give us profit or may cause us 
loss, depending on chance, the probabilities of gain or loss being known. 
The following considerations can be applied, therefore, to more serious 
questions and not only to games played for pastime or for the sake of 
gaining money, as in gambling. 

Suppose that, by the conditions of the game, a player can win a 
certain sum a of money, with the probability p; or can lose another 
sum h with the probability g = 1 — p. 

If this game can be repeated any number of times under the same 
conditions, the question arises as to the probability for a player to gain 
or lose a sum of money not below a given limit. Let us denote by n 
the total number of games, and by m the number of times the player 
wins. Considering a loss as a negative gain, his total gain will be 

K — ma — (n — m)h. 

It is convenient to introduce instead of m another number a defined by 


a = m -- np 

and called ‘‘discrepancy.” Expre.ssed in terms of a the preceding expres¬ 
sion for the gain becomes 


The expression 


K = 7i(pa — qh) -f- (« + b)a. 
E — pa — qh 


entering as the coefficient of n has, as we shall see, an important bearing 
on the conclusion as to the commercial value of the game. It is called the 
“mathematical expectation” of the player. Suppose at first that this 
expectation is positive. By Bernoulli’s theorem the probability for a 
discrepancy less than — n€, e being an arbitrary positive number, is 
sfnaller than any given number, provided, of course, the number of games 
is sufficiently large. At the same time, with the probability approaching 
1 as near as we please, we may expect the discrepancy to be ^ — m. 
However, if this is the case, the total gain will surpass the number 

n{E ~ €(a + h)] 
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which, for sufficiently large n, itself is greater than any specified positive 
number. It is supposed, of course, that € is small enough to make the 
difference 

E — €((i -j- h) 

positive. And that means that the player whose mathematical expecta¬ 
tion is positive may expect with a probability approaching certainty as 
near as we please to gain an arbitrarily large amount of money if nothing 
prevents him from playing a sufficient number of games. 

On the contrary, by a similar argument, we can see that in case of 
a negative mathematical expectation, the player has an arbitrarily small 
probability to escape a loss of an arbitrarily large amount of money, 
again under the condition that he plays a sufficiently large number of 
games. 

Finally, if the mathematical expectation is 0, it is impossible to make 
any definite statement concerning the gain or loss by the player, except 
that it is very unlikely that the amount of gain or loss will be considerable 
compared with the number of games. 

It follows from this discussion that the game is certainly favorable 
for the player if his mathematical expectation is positive, and unfavorable 
if it i^ negative. In case the mathematical expectation is 0, neither 
of the parties participating in the game has a decided advantage and then 
the game is called equitable. Usually, games serving as amusements are 
equitable. On the contrary, all of the games operated for commercial 
purposes by individuals or corporations are expressly made to be profita¬ 
ble for the administration; that is, the mathematical expectation of the 
administration of a game operated for lucrative purposes is positive at 
each single turn of the game and, correspondingly, the expectation of any 
gambler is negative. This confirms the common observation that those 
gamblers who extend their gambling over large numbers of games are 
almost inevitably ruined. At the same time, the theory agrees with 
the fact that great profits are derived by the administrations of gaming 
places. 

A good illustration is afforded by the French lottery mentioned on 
page 19, which, as is well known, was a very profitable enterprise operated 
by the French government. Now, if we consider the mathematical 
expectation of ticket holders in that lottery, we find that it was negative 
in all cases; namely, denoting by M the sum paid for tickets, we find the 
following expectations: 

On 1 ticket (li — 1)-^ = 

On 2 tickets — l)Af = — | jM, 

On 3 tickets (tV 7 % — l)M = — 


and so forth. 
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On the other hand, the expectation of the administration was always 
positive, and because of the great number of persons taking part in this 
lottery, the number of games played by the administration was enormous, 
and it was assured of a steady and considerable income. This was an 
enterprise avowedly operated for the purpose of gambling, but the same 
principles underlie the operations of institutions having great public 
value, such as insurance companies, which, to secure their income, always 
reserve certain advantages for themselves. 

Experimental Verification of Bernoulli’s Theorem 

6. Bernoulli’s theorem, like any other mathematical proposition, is 
a deduction from ideal premises. To what extent these premises may be 
considered as a good approximation to reality can be decided only by 
experiments. Several experiments established for the purpose of testing 
various theoretical statements derived from general propositions of the 
theory of probability, are reported by different authors. Here we shall 
discuss those purporting to test Bernoulli’s theorem. 

I. Buff on, the French naturalist of the eighteenth century, tossed a 
coin 4,040 times and obtained 2,048 heads and 1,992 tails. Assuming 
that his coin was ideal, we have a probability of for either heads or 
tails. Now, the relative frequencies obtained by his experiments are: 

ilH = ^-507 for heads 
iHi = 0.493 for tails 

and they differ very little from the corresponding probabilities, 0.500. 
In this case, the conclusions one might derive from Bernoulli’s theorem 
are verified in a very satisfactory manner. 

II. De Morgan, in his book “Budget of Paradoxes” (1872), reports 
the results of four similar experiments. In each of them a coin was 
tossed 2,048 times and the observed frequencies of heads were, respec¬ 
tively, 1,061, 1,048, 1,017, 1,039. The relative frequencies corresponding 
to these numbers are 

UH = 0.518; mi = 0.512; mi = 0.497; = 0.507. 

The agreement with the theory again is satisfactory. 

III. Charlier, in his book “ Grundzuge der mathematischen Statistik,” 
reports the results of 10,000 drawings of one playing card out of a full 
deck. Each card drawn was returned to the deck before the next draw¬ 
ing. The actual result of these experiments was that black cards 
appeared 4,933 times, and consequently the frequency of red cards was 
5,067. The relative frequencies in this instance are: 

fViAAr = 0.4933 for a black card 
* = 0.5067 for a red card 



no INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. VI 


and they differ but slightly from the probability, 0.5000, that the card 
drawn will be black or white. The agreement between theory and experi¬ 
ment in this case, too, is satisfactory. 

IV. The author of this book made the following experiment with 
playing cards: After excluding the 12 face cards from the pack, 4 cards 
were drawn at a time from the remaining 40, and the number of trials 
was carried to 7,000. The number of times in each thousand that the 
four cards belonged to different suits, was: 

I II III IV V VI VII 

113 113 103 105 105 118 108 

Altogether the frequency of such cases was 765 in 7,000 trials, whence 
we find for the relative frequency 

nrVW = 0.1093 

while the probability for taking 4 cards belonging to different suits is 

im = 0.1094. 

V. In J. L. Coolidge^s Introduction to Mathematical Probability,’’ 
one finds a reference to an experiment made by Lieutenant R. S. Hoar, 
U.S.A., but the reported results are incomplete. The author of this book 
repeated the same experiment which consisted in 1,000 drawings of 5 cards 
at a time, from a full pack of 52 cards. The results were: 503 times the 
5 cards were each of different denominations; 436 times 2 were of the same 
denomination with 3 scattered; 45 times there were 2 pairs of 2 different 
denominations and 1 odd card; 14 times 3 were of the same denomination 
with 2 scattered; 2 times there were 2 of one denomination and 3 of 
another. The remaining possible combination, 4 cards of the same 
denomination with 1 odd, never appeared. The probabilities of these 
different cases are, respectively, 

an = 0.507; un = 0 . 423 ; aw = 0.048; 

= 0 . 021 ; = 0 . 001 ; = 0 . 000 . 

The corresponding theoretical frequencies are 507, 423, 48, 21, 1, 0, 
while the observed frequencies were 503, 436, 45, 14, 2, 0. The dis¬ 
crepancies are generally small and the greatest of them, 13, is still within 
reasonable limits. Deeper investigation shows that the probability that 
a discrepancy will not exceed 13 is about 3^; hence, the observed deviation 
of 13 units cannot be considered abnormal. 

VI. Bancroft H. Brown published, in the American Mathematical 
Monthly, vol. 26, page 351, the results of a series of 9,900 games of craps. 
This game is played with two dice, and the caster wins unconditionally 
if he produces 7 or 11 points, which are called naturals”; he loses the 
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game in case of 2, 3, or 12 points, called “craps.” But if he produces 
4, 5, 6, 8, 9, or 10 “points,” he does not win, but has the right to cast the 
dice an unlimited number of times until he throws the same number of 
points that he had before, or until he throws a 7. If he throws 7 before 
obtaining his point, he loses the game; otherwise he wins. 

It is a good exercise to find the probability of winning this game. 
It is 

iU = 0.493 

that is, a little less than 3^. Multiplying the number of games, in our 
case 9,900, by this probability, we find that the theoretical number of 
successes is 4,880 and of failures, 5,020. Now, according to Bancroft H. 
Brown, the actual numbers of successes and losses are, respectively, 
4,871 and 5,029. The discrepancy 

4871 -- 4880 = -9 

is extremely small, even smaller than could reasonably be expected. 
The same article gives the number of times “craps” were produced; 
namely, 2 appeared 259 times, 3 appeared 508 times, and 12 appeared 
293 times, making the total number of craps 1,060. The probability 
of obtaining craps is 


Tj3n "b "b A i 

hence, the theoretical number of craps should be 1,100. The discrepancy, 
1060 — 1100 = —40, is more considerable this time but still lies within 
reasonable limits. 

VII. E. Czuber made a complete investigation of lotteries operated 
on the same plan as the French lottery, in Prague between 1754 and 1886, 
and in Brlinn between 1771 and 1886. The number of drawings was 
2,854 in Prague and 2,703 in Briinn. The probability that in each draw¬ 
ing the sequence of numbers is either increasing or decreasing, is 

= 0.01667 

while the observed relative frequency of such cases was 
Prague: 0.01612; Brlinn: 0.01739 
and in both places combined 

0.01674. 

The probabilities that among five numbers in each drawing there is 
none or only one of the numbers 1, 2, 3, . . . 9, are, respectively, 


0.58298 and 0.34070. 
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The corresponding relative frequencies were 

Prague: 0.58655 and 0.32656 
Briinn: 0.57899 and 0.34591 

and in both places combined 

0.58183 and 0.33587, respectively. 

The probability of drawing a determined number is Now, according 
to Czuber, for the lottery in Prague the actual number of occurrences for 
single tickets varied from 138 (for No. 6) to 189 (for No. 83), so that for 
all tickets the discrepancy varied from —20 to 31. Besides, there were 
only 16 numbers with a discrepancy greater than 15 in absolute value. 
All these results stand in good accord with the theory. 

VIII. One of the most striking experimental tests of Bernoulli's 
theorem was made in connection with a problem considered for the first 
time by Buff on. A board is ruled with a series of equidistant parallel 
lines, and a very fine needle, which is shorter than the distance between 
lines, is thrown at random on the board. Denoting by I the length of 
the needle and by h the distance between lines, the probability that the 
needle will intersect one of the lines (the other possibility is that the 
needle will be completely contained within the strip between two lines) is 
found to be 


The remarkable thing about this expression is that it contains the 
number t = 3.14159 * * • expressing the ratio of the circumference of a 
circle to its diameter. In the appendix we shall indicate how this expres¬ 
sion can be obtained, because in this problem we deal with a different 
concept of probability. 

Suppose we throw the needle a great many times and count the 
number of times it cuts the lines. By Bernoulli's theorem we may expect 
that the relative frequency of intersections will not differ greatly from 
the theoretical probability, so that, equating them, we have the means of 
finding an approximate value of t. 

One series of experiments of this kind was performed by R. Wolf, 
astronomer in Zurich, between 1849 and 1853. In his experiments the 
width of the strips was 45 mm., and the length of the needle was 36 mm. 
Thus the theoretical probability of intersections is 


= 0.5093. 
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The needle was thrown 5,000 times and it cut the lines 2,532 times; 
whence, the relative frequency 

0.5064. 

The agreement between the two numbers is very satisfactory. If, 
relying on Bernoulli’s theorem, we set the approximate equation 

72 

^ = 0.5064, 

457r ’ 

we should find the number 3.1596 for tt, which differs from the known 
value of TT by less than 0.02. 

In another experiment of the same kind reported by De Morgan in 
the aforementioned book, Ambrose Smith in 1855 made 3,204 trials with 
a needle the length of which was of the distance between lines. There 
were 1,213 clear intersections, and 11 contacts on which it was difficult 
to decide. If on this ground, we should consider half of them as inter¬ 
sections, we should obtain about 1,218 Intersections in 3,204 trials, which 
would give the number 3.155 for ir. If all of the contacts had been treated 
as intersections the result would have been 3^1412—very close to the 
real value of tt. 

In an excellent book ^‘Calcolo delle Probability,vol. 1, page 183, 
1925, by G. Castelnuovo, reference is made to experiments performed by 
Professor Reina under whose direction a needle of 3 cm. in length w^as 
thrown 2,520 times, the distance between lines being 6 cm. Taking into 
account the thickness of the needle, the probability of intersection was 
found to be 0.345, while actual experiments gave the relative frequency 
of intersections as 0.341. 

Appendix 

Buffon’s Needle Problem. Let h be the width of the strip between 
two lines and I < h the length of the needle. The position of the needle 
can be determined by the distance x of its middle point from the nearest 
line and the acute angle ip formed by the needle and a perpendicular 
dropped from the middle point to the line. It is apparent that x may 
vary from 0 to h/2 and ip varies within the limits 0 and 7r/2. We cannot 
define in the usual way the probability of the needle cutting the line, for 
there are infinitely many cases with respect to the position of the needle. 
However, it is possible to treat this problem as the limiting case of 
another problem with a finite number of possible cases, where the usual 
definition of probability can be applied. 

Suppose that h/2 is divided into an arbitrary number m of equal 
parts d = h/2m and the right angle w/2 into n equal parts w = ir/2n. 
Suppose, further, that the distance x may have only the values 

0, 5, 25, . . . 7716 
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and the angle ^ the values 

0, <jiy 2<jif . . . nci). 

This gives 

N ^ {m+ l)(n + 1) 

cases as to the position of the needle, and it is reasonable to assume that 
these cases are equally likely. To find the number of favorable cases, we 
notice that the needle cuts one of the lines if x and ^ satisfy the inequality 

^ I 

X < 2 cos <p. 

The number of favorable cases therefore, is equal to the number of 
systems of integers t, j satisfying the inequality 


(A) 


< 2 cos 


supposing that i may assume only the values 0, 1, 2, ... m and j only 
the values 0, 1, 2, ... n. Because we suppose I < h the greatest 
value of i satisfying condition (i4) is less than m and we can disregard 
the requirement that i should be ^m. Now for given j there are A; + 1 
values of i satisfying {A) if k denotes the greatest integer which is less 
than 

Icosjc. 

In other words, k is an integer determined by the conditions 
k < ^cos j<a ^ k + 1. 

The number of possible values for i corresponding to a given j can 
therefore be represented thus 

m, = ^ cos j(j3 + 

where t?, may depend on j but for all j is ^0 and < 1. Taking the sum 
of all the rrij corresponding to j = 0, 1, 2, . . . n, we obtain the number 
of favorable cases 

I 


M = ^(1 + cos w + cos 2w + 


- + cos n«) + n0 
where 0 again is a number satisfying the inequalities 


0 ^ 0 < 1 . 
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But, as is well known, 

1 . I o I I 1 , sin (n + 

1 4- cos o) 4 cos 2w 4 • • * -4“ cos nw = H-^^ 

2 o • ^ 

2 8m^ 

or, because w = ^ 


1 4 cos w 4 cos 2a) 4 * • * 4 cos ?ia) = ^ 4 ^ cot 


therefore 

^ i 1 i 

Dividing this by iV = (m 4 l)(n 4 1) and substituting for b and cd 
their expressions 

s, h TT 

2m ^ 2n 

we obtain the probability in the problem with a finite number of cases 


^ ^ ryi 4n m 1 _n0_ 

N 2A m 4 1 w 4 1 2A m 4 1 n 4 1 (n 4 l)(w 4 1) 

The probability in Buff on \s problem will be obtained by making m 
and n increase indefinitely in the above expression. Now, since 


lim 


m 4 1 


= 1, 


lim , 


m 


(m 4 l)(w 4 1) 


= lim 


{n 4 l){m 4 1) 


and 


cot 


lim 


/i 4 1 


An 4 

TT 


we have 


M 21 

hm -Tr = r- 
N hir 


Thus we arrive at the expression of probability 

V 


hir 


(m, 71 cc] 


in Buffon's needle problem. 
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Problems for Solution 

Another very simple proof of Bernoulli’s theorem, due to Tshebysheff (1821- 
1894), is based upon the following considerations: 

1. Prove the following identities: 

n n 

Tm{rn - np) =0, Tm{m - np)* = npq. 

m —0 m—0 

Indication of the Proof. Differentiate the identity 

n 

g-np«(pgu -j_ g)n = ^ 

m«»0 


twice with respect to u and set u = 0. 

2. If Q is the probability of the inequality \m — np| ^ nc prove that 


Q< 


“ESl. 


Indication of the Proof. In the identity 

n 

^ Tm(ni — np)* - npq 

m -0 

drop all the terms in which |m — np| < ne and in the remaining terms replace 

(m — np)* 

by n*e*. The resulting inequality 



jm —np|^n« 


is equivalent to the statement. 

3. Prove that 

P > 1 -u 

if n > pqlr\t*. 

Indication of the Proof. P = 1 — Q, Q < pq(n€* and pq/n^* < if n > pq/rit*. 
The following two problems show how probability considerations can be used in 
proving purely analytical propositions. 

4. S. Bernstein's Proof of Weierstrass' Theorem. The famous theorem due to Weier- 
strass states that for any continuous function/(x) in a closed interval a ^ x there 
exists a polynomial P(x) such that 

|/(x) - P(x)l < tr 

for a ^ X ^ & where <r is an arbitrary positive number. By a proper linear trans¬ 
formation the interval (a, h) can be transformed into the interval (0, 1). According 
to S. Bernstein, the polynomial 


m«0 
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for sufficiently large n satisfies the inequality 

|/(x) - P{x)\ < <r 

uniformly in the interval 0 ^ x ^ 1. 

Indication of the Proof. For x = 0 and x = 1 we have /(O) = P(0) and 

/(I) = P(l). 

It suffices to prove the statement for 0 < x < 1. Let x be a constant probability in 
n independent trials. We have 

(o) /(*) -P(x) = 

m = 0 

By the property of continuous functions, there is a number t. corresponding to any 
positive number a such that 


|/(x') “/(x)| < ^ 

whenever 

|x' - x| < * (0 ^ X', X g 1).- 

Also, there exists a number M such that |/(x)l ^ Af for 0 ^ x ^ 1. From equation 
(a) we get 


|/(x) - P(x)| ^ 2MR 

where P and R are, respectively, the probabilities of the inequalities 



m 

< € and 1 

m 


— — X 

-X 


n 


n 

Now P < 1 and 


R < rj 



if n > 1/4 <*i 7. 
if 


Take rj = <rfAM) then 

|/(x) - P(x)| < 


n > 


M 


6 . Show that 


m , 

J " x”*(l — x)"“*"dx 

m 

-> 1 _ -_^_ 

JT a:'"(l - 2(n + l)e‘ 

provided 0 < m <*n and — —€>0, ~-f<<l (Castelnuovo). 
n n 
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Indication of the Proof. By Prob. 6, Chap. IV, page 72, the ratio 


a;m(l _ x)^~^dx 

- x)^-^dx 


represents the probability Q of at least m + 1 successes in a series of n + 1 inde¬ 
pendent trials with constant probability 


Set 

whence 

But 

Hence 


m 

p = - ~ 


m + 1 = (n + l)p + (n + l)<r 
n — m 


n(n + 1) 


+ € > «. 


^ p(l - p) 1 

Q < : ■ - < ■ 


(n -h l)<r* 4(n + 1)6* 

— —€ 

x^{l - x)«“*dx J 


and by a similar argument 

X x^Cl — x)"“"*dx 


J^'x’"(l - x)^-”^dx 4(w + 1)«* 


References 

Jacob Bernoulli: “Ars Conjectandi,” pars quarta, 1713. 

P. L. Tshebyshefp: “Sur les valeurs moyennes,” Oeuvres, I, pp. 687-694. 

F. P. Cantelli: “Sulla probabilitll come limite di frequenza,” Rend. d. R. Accad. 
Naz. dei Ldncei^ 26, 1917. 

Markoff; “Calculus of Probability,” 4th Russian ed., Leningrad, 1924. 



CHAPTER VII 


APPROXIMATE EVALUATION OF PROBABILITIES IN 
BERNOULLIAN CASE 

1. In connection with Bernoulli's theorem, the following important 
question arises: when the number of trials is large, how can one find, at 
least approximately, the probability of the inequality 



where « is a given number? Or, in a more general form: How can one 
find, approximately, the probability of the inequalities 

I 

where I and V are given integers, the number of trials n being large? 

The exact formula for this probability is 

p = 

where as before, represents the probability of s successes in n trials. 
While this formula cannot be of any practical use when n and V — I 
are large numbers, yet it is precisely such cases that present the greatest 
theoretical and practical interest. Hence, the problem naturally arises 
of substituting for the exact expression of F an approximate formula 
which will be easy to use in practice and which, for large 7i, will give a 
sufficiently close approximation to P. De Moivre was the first suc¬ 
cessfully to attack this difficult problem. After him, in essentially the 
same way, but using more powerful analytical tools, Laplace succeeded 
in establishing a simple approximate formula which is given in all books 
on probability. 

When we use an approximate formula instead of an exact one, there 
is always this question to consider: How large is the committed error? 
If, as is usually done, this question is left unanswered, the derivation of 
Laplace's formula becomes an easy matter. However, to estimate the 
error comparatively long and detailed investigation is required. Except 
for its length, this investigation is not very difficult. 

2. First we shall present the probability T , in a convenient analytical 
form. The identity 

F(0 « {pt + qY = To + Tit + + • • • + Tnt - 

110 
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after substituting < == becomes 

F(e»>) = To + Tie^ + Tae*^ + • • • + Tne"^. 
Multiplying it by and integrating between — t and t, we get 


J*js-^Fie^)d(p = 2irT, 
because for an integral exponent k 


Thus 



if 

if 


kj^O 

jb = 0. 



and this is the expression for T« suitable for our purposes. To find the 
sum 

p = Xt, 

«-i 

we observe first that 




2 


e—^ 


tml 










On the other hand, the complex number F(e^) can be presented in 
trigonometrical form, thus: 

F(e^) = Re^ 

whence 



or, because P is real, 



Finally, because 22 is an even function of ^ and 0 is an odd one, we can 
extend the integration over the interval 0, t on the condition that we 
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double the result. Thus we obtain 


It is convenient to introduce instead of I and V two numbers f i and fj 
defined by 

Z = np + i + f 1 Z' = np - i + f 2 a /Bn 
where Bn = npq. Setting further 

e = npv> + X, 

P can be presented as 

P = 

where Pi and Pi are obtained by taking f = f i and f = f i in the integral 

(1) j = 1 

^ 2wJo sm 

3. Our next aim is to establish upper and lower limits for R. 
Evidently 

P = (p* + ?* + 2pq cos ipy = — ipq sin* = p". 

Now 

log p = ^ log - ipq sin* ^ = -2pg sin* | - |(4pg)* sin* ^ - 

“ sin« I - • • • 


whence 


log p < --2pg sin* ■ 


Since < t/2, we have 


and consequently 


sin|>| 


log p < 


2pg , 

p < « 
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for all values of tp in the interval of integration. On the other hand, we 
have 

sin I > I - ^ > 0 for v* < 24 




which gives another upper bound for p: 


The corresponding upper bounds for R will be 


W Rce 

Bn Bn . 

(5) R<e i*"*'^* 

To find a lower bound for R we shall assume p g ir/2. We can 
present log p thus: 

log P = - |(4pg)* sin< | + 2p9|(|^ - sin* |j- - 


g(4pg)’ sin* | - 


On the other hand, 


1 « 1 o 5(4p9)’ sin* I 

g(4pg)* sin* | + g(4p?)* sin* | + • • • <-< g(4p9)*sin*| 

1 — 4p(7 sin^ “ 


l) 


so that 


~ i} “ I “ ■ • ■ > ^ I" 

- g(4p9)* sin* I = ^ sin* ||l - 32p*g* sin* || > 0 


and consequently 


log P > -^<p* - |(4pg)* sin* | > -^»)* - ^^i* 
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if ^ ^ 2 ’ Hence, 

( 6 ) 




and this is valid for ip ^ v/2. 

4. Let T be defined by 

= 3^-L 

Assuming Bn ^ 25 from now on, we shall have, 


and a fortiori t < 7r/2. Let us suppose now that p varies in the interval 
0 ^ ^ ^ T. By inequality (6) we shall have 

-\vqBn<p* \ pq -Ib.v* 

R - e ^ > e ^ -ly> - ^Bn(p*e ^ > 


1 o 4 


because e“* — 1 > — a; for x > 0 and pq ^ J4- 

On the other hand, using inequality (5), we find that 



< 

—T* 1 

■ e^* < r^Bn<p*e * 

since 



Bnr* 

3 


Ip 24 

V 

P4|tD 

II 



From the two inequalities just established it follows that 


< ,\Bn^p^e ^ 


(7) \r - 

in the interval 

0 ^ ^ ^ r. 

6 . We turn now to the angle 0. Evidently 


BnV* 


_ , p sin 

0 = n arc tg — - -= nw 

7 4- p cos (f 


where 


CO = arc tg 


p sin (p 


g p cos <p 

By successive derivations with respect to ip we find 
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^ — p* + pg COS if , _ Pg(p ~ g) sin ^ 

“ p» + 2 p 5 cos ^ -f dif!^ (p* -f 2pq cos ^ + g*)* 

+ (1 ^pg) CQS v> - 2pg cos^ v> 

P9vP 9; ^p 2 ^ 2p^ cos + q^y 

d^Qi — / \ sin ^ — 1 + 4pg+20pV+8pg(l — 2p<y) cos (p—^p^q^ cos^ <p] 

dip^ “■ (p*+2pg cos ip+q^y 

and for ^ » 0 

(I). -»' (^). - (®), - «<” - «'• 


Furthermore, one easily verifies that in the interval 0 ^ ^ ^ v/2 


d*a> 

d^J 

d^1 


^ |M|p - 9l(l - 4pg sin* 

^ 2pg|p - gl^l - 4p5f sin* #>• 


Hence, applying Taylor’s formula and supposing 0 ^ ^ ^ t, we get for x 


(8) X = J^»(p — q)^ + 
where 

(9) |Af| < - g|(l - pqr*)-*, 


or 

(10) X = •£<»>* 
where 

(11) W < ABnIp - 5|(1 - P9T*)-». 

Using inequalities (9) and (11), we easily find 

(12) sin {(y/Bn<p - x) = sin (fV^^) - JHn(p - g)^ cos iSy/Wnip) + r 
where 


(13) |r| < *^n|p - g|(l - pgT*)~V* + Th^lip - g)*(l - v<ir^)~^*p\ 


provided 0 ^ ^ ^ r. 

6. To find an appropriate expression of the integral J we split it into 
two integrals, J i and J 2 , taken respectively between limits 0, t and r, x. 
We have 
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because sin Let t\ = t then by inequality (4) 

A 1C A 


r«^< f" « P £2!! 

Jr, <P Jr, V j x/K “ 


-“’du 


But for positive x the following inequality holds: 


(14) 

consequently 


^ 2^2' 


Jx U 
rr,d<p ^ 


Iv^ 


Noting that /2(^) is a decreasing function of (p we have for t g ^ g ri 
^ R(r) < le-ivT". 


Hence, 




and combining this inequality with the one previously established, we 
have finally 


(15) 


\J^\ + 


7. More elaborate considerations are necessary to separate the 
principal term and to estimate the error term in Ji. Making use of the 
inequality 


1 _ 1 

sm X x\ 


6 sin X 


we can present Ji thus: 

J. = 2 ^ ^ 

*P 


where 


\M < —r 

48t sm 


R<pd(Pf 


and, because R < in the interval 0 < ^ < t 


|A|< 


-Br‘- 


32ir sin ^ 
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Since r* ^ % we find by direct numerical calculation 


-^- < 0.0205, 

32jr sin ^ 


and so, finally, 


|A| < 0.02055-^ 

8 . Referring now to inequality (7), we can write 
1 sin {iVK<P - x) ^ ^ ^ sin (rv^y - x)^ 

27rJo <P 2wJo <p 

where 


lA.I < < 0.04J[{-‘. 

Combining this with the result of the preceding section, we can present 
J 1 thus 

(16) J, = ^ 

and 

IA 2 I < 0.0605^-^ 

9. To simplify the integral in the right member of (16), we substitute 
for sin (f\/J5„^ — x) its expression (12). Taking into account inequal¬ 
ity (13), we get (17): 

2 - x)^ ^ 2 rv,«.^.sin _ 

2 jrJo tp 27rJo <P 

n Bntp* _ 

- ^(p - 9 ) e * cos (fV Bn<p)d<p + Ai 

where 

iAsI < - 9l(l - + 

+ ^B^P - ?)*(! -pgrT’^'e-i^’^Vd^. 

But 

= 8 B-\ B-i 
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and so 

|A.| < - 9l(l - P 9 r>)-^ + A(p _ ,)J (1 _ pqr^)->B-K 

Now pq ^ T* ^ Bn ^ 25, consequently 

On the other hand, 

1 - p<?r* g 1 - |{(^) - } = S + |j(P - 9)’. 

and for positive x the maximum of 

is attained for x* = whence it follows that 

Taking into account all this, we have 

|A,| < 0.091P - g\B-\ 

10. As to integrals in the right-hand member of (17) we can write 


(18) 

Si®"*'”’ 

.sin ^ 2 r%_,«.,.sin (rVS.^)^ ^ 

ip 2jrJo <P 

(19) 

B.ip - q) 
br 

|* g-tB.v’^1 COS (X-\/W,ip)d(p = 



= J cos {};VB,.p)d^ + A. 

where 


|A4| < 1 

rj, q> 3ir 

and 




|At| < ^ f e-^'uHu < 

^ ^jr. ta /3 


because 


< xe“** 
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for X > 1, as can easily be proved. Finally, taking into account (15), 
(16), (17), (18), (19), we get 


+ COS {iVBn<p)d<p 


^ 0.065 + 0.09|p - gl , 

^ Bn 


since for Bn ^ 25 


~ loK - + + — — < -• 

4 log 2 -t- g ^ 3^ ^ ^ 2 


6 ' 3t 

It now remains to evaluate definite integrals in (20). We have 

(21) I- r = I- f 

' ’ 2irJo V> 2)rJo u 

(22) COS (f\/Bi[v)d»’ 


-- — J- I e cos ftidu. 

eirVKJo 


Differentiating the well-known integral 


X 


g-Hix* cos 6xdx 




(o > 0) 


twice with respect to b, and after that substituting a = }i, b = {, vie 
find for (22) this expression; 


-£—£=.(1 - f *) e - tt *. 

ev&i; 

On the other hand, an integral of the type 


L(a) = r"e-»«*2IL£!fd„ 

Jo U 

can be reduced to a so-called ‘‘probability integral.’* In fact, the 
derivation with respect to a gives 

J r* --1** -~ 

0 e cos audu = * 

and since L(0) - 0, 


L(a) = 
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Consequently, integral (21) can be reduced to 






Having found an approximate expression of the integral J after sub¬ 
stituting in it {*2 and f i for f and taking the difference of the results, we 
find the desired expression of P. 

11. The result of this long and detailed investigation can be sum¬ 
marized as follows: 

Theorem. Let m he the number of occurrences of an event in a series 
of n independent trials with the constant probability p. The probability P 
of the inequalities 

wp -h i -h f i \/npq ^ m ^ np — J -f- f 2 \/npq 


where extreme members are integers^ can be represented in the form 


(23) + P [(l-fl)e 2 

V2irjf, 6V2wnpq 

The error term w satisfies the inequality 

0.13 + 0.18|p-g| 

^P9 

provided npq ^ 25. 

By slightly increa.sing the limit of the error term, this theorem can 
be put into more convenient form. Let ti and ^2 be two arbitrary real 
numbers and let P denote the probability of the inequalities 


np -f ti\/npq ^ m ^ np + tty/npq. 
If the greatest integers contained in 


np + hy/ npq and nq — Uy/ ^P? 

are respectively, A 2 and Ai^ the preceding inequalities are equivalent to 
n — Ai ^ m S A 2 . 


To apply the theorem, we set 


np — 5 + f 2 ^P 9 = -^2 = np -f- t 2 \/npq — 62 
np 4- i = n - ill = np + -f Bi 

$2 and being, respectively, the fractional parts of np + t 2 y/npq and 
nq — f iV npq. Hence, 


f2 = ^2 + 
f^ = - 


2 ^ ^2 
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Applying Taylor’s formula, it is easy to verify that 


-4= e -^1 e *<fM- 

V^Ju 




. 0.061 

npq 


\/2Tnpg 

-^^=^[(1 - - (1 - - 


l6'\/2Tnpg 


Q\^2wnpq 

I ^PQ 


whence, finally, we can draw the following conclusion: For any two 
real numbers <i, ^2, the probability of the inequalities 

ti\/n^ ^ m — np ^ i%\/npq 

can be expressed as follows: 

+ a - + a - ^ 

\/%rnpq 

+ - (1 - + 0 


= 1 r* 


where 0 ^ and Bi are the respective fractional parts of 

np + Uy/npq and nq — hy/ npq 


and 


lal < + ,-iViSi 

npq 


provided npq ^ 25 . 

In particular, if <2 = — = f, the probability of the inequality 


is expressed by 


\m — np\ ^ ty/npq 


P 



e~^^'du 


+ 


Bi- B 
\/2wnpq 


-V**’ + 12 


with the same upper limit for 12. Laplace, supposing that np + ty/npq 
is an integer in which case 02 = 0 and 0i is a fraction less than (npg)”^, 
pves for P the approximate expression 


P = 


2 

\/^ 


X 


e~^^*du 


+ 


\/2irnpq 


without indicating the limit of the error. Evidently Laplace’s formula 
coincides with the formula obtained here by a rigorous analysis, save for 
terms of the same order as the error term 12. 
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To find an approximate expression for the probability P of the 
inequality 


it suffices to take 


Then 



P = 



1 - 

y/'^npq 


2pq 


+ ft 


and evidently P tends to 1 as n increases indefinitely. This is the second 
proof of Bernoulli’s theorem. 

Referring to the above expres.sion for the probability of the inequalities 


i\y/npq ^ m — np ^ t%\/npq 

and supposing that the number of trials n increases indefinitely while 
ti and t 2 remain fixed, we immediately perceive the truth of the following 
limit theorem: The probability of the inequalities 


tends to the limit 


ti ^ 


m — np 
V npq 


^ tt 


1 r‘* 


e~^'*'du 


as n tends to infinity. 

This limit theorem is a very particular case of an extremely general 
theorem which we shall consider in Chap. XIV. 

12. To form an idea of the accuracy to be expected by using the 
foregoing approximate formulas, it is worth while to take up a few 
numerical examples. Let n = 200, p = q = ^ i and 


95 ^ ^ 105. 


The exact expression of the probaVnlity that yn will satisfy these ine¬ 
qualities is 

P = 200! r /lOO ^00 • 99 100 - 99 • 98 

1001100! [ VlOl 101 • 102 101 • 102 103 

100 - 99 * 98 - 97 100 • 99 • 98 • 97 • 96 \1 

101 • 102 • 103 • 104 ioT - 102 • 103 • 104 105/]* 
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The number in the brackets is found to be 9.995776 and its logarithm to 
five decimals 

0.99982. 


The logarithm of the first factor, again to five decimals, is 


whence 


2.75088, 

log P = T.75070; P = 0.56325, 


and this value may be regarded as correct to five decimals. Let us see 
now what result is obtained by using approximate formulas. In our 
example 


and 


ty/npq = = 5; 


« = = 0.707107 

V2 


2 r* 

e *dw = 0.52050. 

V^Jo 


The additional term 


g-0.25 

Vloo? 


= 0.04394 


and by Laplace’s formula 

P = 0.56444. 


This is greater than the true value of P by 0.00119. Now, the theoretical 
limit of the error is nearly 

= 0.004 

so that, actually, Laplace’s formula gives an even closer approximation 
than can be expected theoretically. 

When npq is large, the second term in Laplace’s formula ordinarily 
is omitted and the probability is computed by using a simpler expression: 


P = 


2 n 

V^Jo' 


^du. 


In our case this expression would give 


P = 0.52050 


instead of 0.56325 with the error about 0.043, which amounts to about 
8 per cent of the exact number. Such a comparatively large error is 
explained by the fact that in our example npq = 50 is not large enough. 
In practice, when npq attains a few hundreds, the simplified expression for 
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P can be used when an accuracy of about two or three decimals is con¬ 
sidered as satisfactory. In general, the larger i is, the better approxima¬ 
tion can be expected. 

For the second example, let us evaluate the probability that in 6,520 
trials the relative frequency of an event with the probability p = % 
will differ from that probability by less than € = To find t, we 

have the equation 

i\^npq = cn 

where 

n = 6520, p = I, g = I, c = 


which gives 


130.4 

V1564.8 


3.2965, 


and, correspondingly. 


2 n -~ 

—= \ e ^du ^ 0.999021. 

V^Jo 


Since m satisfies the inequalities 


3912 - 130.4 ^ m ^ 3912 + 130.4 


the fractions $i and 02 are 6i — 02 = 0.4 and the additional term is 


_ = 0.000009. 

V3129.67r 

Hence, the approximate value of P is 

P = 0.999030. 


To judge what is the error, we can apply Markoff^s method of con¬ 
tinued fractions to find the limits between which P lies. These limits are 

0.999028 and 0.999044. 


The result obtained by using an approximate formula is unusually good, 
which can be explained by the fact that in our example tisa, rather large 
number. Even the simplified formula gives 0.999021, very near the 
true value. 

Finally, let us apply our formulas to the solution of the inverse 
problem: How large should the number of trials be to secure a probability 
larger than a given fraction for the inequality 


m 

n 


P\ 
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Let us take, for example, p - ^ - 0.01 and the lower limit of proba¬ 

bility 0.999. To find n approximately, we first determine t by the 
equation 


which gives 


—r 

\^Jo 


t 

e ^du = 0.999, 


i = 3.291. 


Hence, 

w = -^-^ (3.291)^ = 24,066, approximately. 


We cannot be sure that this limit is precise, since an approximate formula 
was used. But it can serve as an indication that for n exceeding this 
limit by a comparatively small amount, the probability in question will 
be >0.999. For instance, let us take n = 24,300. The limits for m 
being 

8,100 - 243 ^ m ^ 8,100 + 243, 
we find t from the equation 

t = fe. = 3.3068 

\P9 


and correspondingly 

2 r* -- 

—== I e ^du = 0.999057. 

V^Jo 

The additional term in Laplace's formula being 0.000023, we find 
P > 0.99908 - 0.00006 > 0.999. 

Thus, 24,300 trials surely satisfy all the requirements. 

Problems for Solution 

1. Find approximately the probability that the number of successes will be con¬ 
tained between 2,910 and 3,090 in 9,000 independent trials with constant probability 

Am. 0.9570 with an error in absolute value <10~^ [using (23)]. 

2. In Buffon's experiment a coin was tossed 4,040 times, with the result that heads 

turned up 2,048 times. What would be the probability of having more than 2,050 
or less than 1,990 heads? Am. 0.337. 

8. R. Wolf threw a pair of dice 100,000 times and noted that 83,533 times the 
numbers of points on the two dice were different. What is the probability of having 
such an event occ\ir not less than 83,533 and not more than 83,133 times? Does the 
result suggest a doubt that for each die the probability of any number of points was J^? 
Am. This probability is approximately 0.0898 and on account of its smallness some 
doubt may exist. 
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4 . If the probability of an event E is what number of trials guarantees a 
probability of more than 0.999 that the difference between the relative frequency of 
E and ^ will be in absolute value less than 0.01? Ana. 27,500. 

6. If a man plays 10,000 equitable games, staking SI in each game, what is the 
probability that the increase or decrease in his fortune will not exceed $20 or $50? 

Ana. (a) 0.166; (6) 0.390. 

6 . If a man plays 100,000 games of craps and stakes 50 cents in each game, what 

is the probability that he will lose less than $300? Ana. About 3^oo. 

7. Following the method developed in this chapter, prove the following formula 
for the probability of exactly m successes in n independent trials with constant 
probability p: 


= 1 -^Tl 4- (g - P)(^* - , 

\/ 2irnj)q 6\/ npq J 


where t is determined by the equation 


m = np + t\/npq 

and 

(npqP 

provided npq ^ 25. 

8. Developments of this chapter can be greatly simplified if p = q — (sym¬ 
metrical case). In this case one can prove the following statement: The probability 
of the inequalities 




n 1 



can be expressed as follows: 




, (f.* - - (fi* - . 

du H-=- + A 

12V2im 


where |A| < l/2n* for n > 16. 

9. In case of “rare” events, the probability p may be so small that even for a 
large number of trials the quantity X = np may be small; for example, 10 or less. 
In cases of this kind, approximation formulas of the type of Laplace’s cannot be used 
with confidence. To meet such cases, Poisson proposed approximate formulas of a 
different character. Let Pm represent the probability that in n trials an event with 
the probability p will occur not more than m times. Show that 


Pm « « 


■{i + i+r^+ - + r 2 ; 3~“m] 


+ A = Qm + A 


where 


and 


|A| < - DQm if Qm^h 

|A| < (e* - 1)(1 - Qm) if Qm < I 


X4-J+- 
4 n 


X 


2(n - X) 
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Indication of the Proof. We have 


l+-+V-F^ + - • • + 

q 1 • 2 g* 


Now, since g « 1 — ■ 




1 •2*3 • • • m 


(i '¥■ "'1 Cl '“‘V- 




2 a — k 

<^fc.O ^,2(n-X)^ 


Consequently 




P» < «*<?«; 0, 


= *'{i+I + l^+ • • • +rTTTTr;;;} 


On the other hand, 


-p.-= 2 


n(n - 1) • • • (n - M 4* 1) 


(- 00 - 0 (--^) 


1 - P« < e*(l - Q„) 

and 

Pm > «*Q« + I - e*. 

The final statement follows immediately from both inequalities obtained for Pm. 
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10. With the usual notation, show that 




where 

mX (n —m)X» m(m— 1)| 
Q = e n 2n* 


r /(n — m)X* m* \1 

L* ~ A3(n - X)* 2n(n - m)}\’ ® 


< « < I. 


IndiccUion of the Proof. Referring to Chap. I, page 23, we have 


But 


whence 


(- 0 ' 


’■-<s('-r('-sr 

■ (-r 


^ (n —m)X* 

2n« 


< C 


m(in-l) 
2n . 


X- 


mX (n — m)X* m(m — 1) 


. e » 2n* 

m! 


2n 


On the other hand, 




. , »”»“X (n —m)X* (n —m)X* 
>. g ^ n-X"^ 2(n-X)» 3(n-X)* 


m(m — 1) 
g 2(n-m). 


Hence 


and a fortiori 


mX in — m)\* in(m-l) (n —m)X» m» 

> e n 2n« 2n . c 3(n-X)» 2n(n-m), 


3(n”- X)' 


[- 


2n(n 


— 1 

-m)J 


If X and m are both small in comparison to n the above-introduced factor Q will be 
near 1. Under such circumstances we may be entitled to use an approximate formuls 
due to Poisson 


T^ 
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The preceding elementary analysis gives means to estimate the error incurred by using 
this formula. 

11. Apply the preceding considerations to the case n = 1,000, p — Koo» ^ “ 10 
and m = 10. Ans. 0.1256 < Tm < 0.1258. Poisson’s formula gives 0.1251—a 
very good approximation. Alo, 0.5807 < Pio < 0.5863. Taking Pm « 0.583, the 
error in absolute value will be less than 3.3 ■ 10~^ By a more elaborate method it is 
found Pio 0.5830. 
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CHAPTER VIII 


FURTHER CONSIDERATIONS ON GAMES OF CHANCE 

1. When a person undertakes to play a very large number of games 
under theoretically identical conditions, the inference to be drawn from 
Bernoulli's theorem is that that person will almost certainly be ruined 
if the mathematical expectation of his gain in a single game is negative. 
In case of a positive expectation, on the other hand, he is very likely to 
win as large a sum as he likes in a sufficiently long series of games. 
Finally, in an equitable game when the mathematical expectation of a 
gain is zero, the only inference to be drawn from Bernoulli’s theorem is 
that his gain or loss will likely be small in comparison with the number of 
games played. 

These conclusions are appropriate however, only if it is possible to 
continue the series of games indefinitely, with an agreement to postpone 
the final settling of accounts until the end of the series. But if the 
settlement, as in ordinary gambling, is made at the end of each game, 
it may happen that even playing a profitable game one will lose all his 
money and will have to discontinue playing long before the number of 
games becomes large enough to enable him to realize the advantages 
which continuation of the games would bring to him. 

A whole series of new problems arises in this connection, known as 
problems on the duration of play or ruin of gamblers. Since the science 
of probability had its humble origin in computing chances of players in 
different games, the important question of the ruin of gamblers was 
discussed at a very early stage in the historical development of the 
theory of probability. I’he simplest problem of this kind was solved by 
Huygens, who in this field had such great successors as de Moivre, 
Lagrange, and Laplace. 

2. It is natural to attack the problem first in its simplest aspect, and 
then to proceed to more involved and difficult questions. 

Problem 1. Two players A and B play a series of games, the proba¬ 
bility of winning a single game being p for A and q for R, and each game 
ends with a loss for one of them. If the loser after each game gives his 
adversary an amount representing a unit of money and the fortunes of 
A and B are measured by the whole numbers a and 6, what is the proba¬ 
bility that A (or B) will be ruined if no limit is set for the number of 
games? 


139 
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Solution. It is necessary first to show how we can attach a definite 
numerical value to the probability of the ruin of A if no limit is set for 
the number of games. As in many similar cases (see, for instance, Prob. 
15, page 41) we start by supposing that a limit is set. Let n be this 
limit. There is only a finite number of mutually exclusive ways in which 
A can be ruined in n games; either he can be ruined just after the first 
game, or just after the second, and so on. Denoting by pi, pa, . . . pn 
the probabilities for A to be ruined just after the first, second, . . . nth 
game, the probability of his ruin before or at the nth game is 

Pi -H P2 + * • * 4- Pn. 

Now, this sum being a probability, must remain <1 whatever n is. 
On the other hand, each term of this sum is ^0 for the same reason. 
Both remarks combined, show that the series 

Pi + P2 + P3 + * * • 

is convergent. We take its sum as the probability for A to be ruined 
when nothing limits the number of games played. So it is clear that 
this probability, although unknown, possesses a perfectly determined 
numerical value. Let us denote by p, the probability for A to be ruined 
when his fortune is x. The probability we seek is Pa. Obviously, 

( 1 ) 2/0 = 1 , 

for A is certainly ruined if he has no money left. Similarly 

(2) Pa+6 = 0 

because if the fortune of A is a + 6, it means that B has no money where¬ 
with to play, and certainly the ruin of A is then impossible. Further, 
considering the result of the game immediately following the situation 
in which the fortune of A amounted to x it is possible to establish an 
equation in finite differences which p, satisfies. For, if A wins this game 
(the probability of which ca.se is p), his fortune becomes x + 1 and the 
probability of being ruined later is p,+i. By the theorem of compound 
probability, the probability of this case is ppx+i. But if A loses (the 
probability of which is g), his fortune becomes x — 1 and the probability 
that the one po.ssessing this fortune will be ruined is p,-i. The proba¬ 
bility of this case is gp,-i. Now, applying the theorem of total proba¬ 
bility, we arrive at the equation 

(3) y, = pp,+i + pp*-i. 

This equation has a particular solution of the form a* where a is a 
root of the equation 

a = pa* + g. 
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If p 9 ^ q there are two roots 



and, correspondingly, there are two distinct particular solutions of 
equation (3): 

1 and (l) • 

Obviously, 


2 /. 




is also a solution of (3) for arbitrary C and D. 
C and D so as to satisfy conditions (1) and (2). 
equations 

C + Z) = 1 

pa+bc 


whence 


Now, we can dispose of 
To this end we have the 


and 


_ po+b* 

qa+bpx _ pa+bqx 

y- = px(^+» _ p.+k)‘ 


It remains to take x = a to obtain the required probability 


= 9*(g* - v'‘) ^ g°(p* - q'’) 

^+6 __ pa+b 


that the player A possessing the fortune a will be ruined. Similarly, 
the probability of the ruin of B is 


_ p*(p“ - r) 

* pa+6 _ qa+b 

It turns out that 

Va Zb = 1, 


so that the probability that the series of games will continue indefinitely 
without A or B being ruined, is 0. The probability 0 does not show the 
impossibility of an eternal game, because this number was obtained, 
not by direct enumeration of cases, but by passage to the limit. Theo¬ 
retically, an eternal game is not excluded. Actually, of course, this 
possibility can be disregarded. 
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If p = g = so that each single game is equitable, the preceding 
solution must be modified. In this case, the above quadratic equation 
in a has two coincident roots == 1 , and we have only one particular 
solution of (3), y, = 1. But another particular solution in this case is 
Xy so that we can assume 

2 /x == C + Da; 

and determine C and D from the equations 

C=l; C + D(a + fe)=0 . 

Thus, we find that 


and for a: = a 

h 

Similarly, giving Zh the same meaning as above, 


a 

If, therefore, each single game is equitable, the probabilities of ruin are 
inversely proportional to the fortunes of the players. The practical 
conclusion to be derived from this theoretical result is sheer common 
sense: It is unwise to play indefinitely with an adversary whose fortune 
is very large without submitting oneself to the great risk of losing all 
one^s money in the course of the games, even if each single game is 
equitable. Gamblers who gamble at an even game with any willing 
individual are in the same condition as if they were gambling with an 
infinitely rich adversary. Their ruin in the long run is practically 
certain. 

If single games of the series are not equitable, that is, p g the 
conclusion may be different. Supposing p > g, we have a case when 
the expectation of A is positive; in each single game, A has an advantage 
over his adversary. The above expression for ya may be written in the 
form 



and, because g/p < 1 , it is easy to see that ya remains always less than 
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and converges to this number when h becomes infinite. Thus, playing a 
series of advantageous games even against an infinitely rich adversary, 
the probability of escaping ruin is 



If a is large enough, this can be made as near 1 as we please, so that a 
player with a large fortune has good reason to believe that in the course 
of the games he will never be ruined, but that actually he is very likely 
to win a large sum of money. 

This conclusion again is confirmed by experience. Big gambling 
institutions, like the Casino at Monte Carlo, always reserve certain 
advantages to themselves, and, although they are willing to play with 
practically everybody (as if they played against an infinitely rich adver¬ 
sary) the chance of their being ruined is slight because of the large 
capital in their possession. 

3. In the problem solved above the stakes of both players were 
supposed to be equal, and we took them as units to measure the fortunes 
of both players. Next it would be interesting to investigate the case in 
which the stakes of A and B are unequal. An exact solution of this 
modified problem, since it depends on a difference equation of higher 
order, would be too complicated to be of practical use. It is therefore 
extremely interesting that, following an ingenious method developed by 
A. A. Markoff, one can establish simple inequalities for the required 
probabilities which give a good approximation if the fortunes of the 
players are large in comparison with their stakes. 

Problem 2. If the conditions presupposed in Prob. 1 are modified, 
in that the stakes of A and B measured in a convenient unit are a and ^ 
and their respective fortunes are a and 6, find the probabilities for A or 
B to be ruined in the sense that at a certain stage the capital of A will 
become less than a or that of B less than jS. 

Solution. Let y* be the probability for A to be forced out of the 
game by the lack of sufficient money to set a full stake a when his 
fortune amounts to x and consequently that of his adversary is a + 6 ~ x. 
In the same way as before, we find that y* is a solution of the equation 
in finite differences: 

(4) Vx = pyx+fi + qVx-^, 

To determine y, completely, in addition to (4), we have two sets of 
supplementary conditions: 

(5) 2/0 = = • • • = Va-l = 1 

(6) 2/a+» = Va+b-l = • • • = ya+b-{ 0 -l) = 0. 
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Equation (6) expresses the fact that if the fortune of A becomes less 
than his stake, it is certain that A must quit. On the contrary, equation 

(6) indicates the impossibility for A to be ruined if the other player B 
does not have enough money to continue gaming. Equation (4) is an 
ordinary equation in finite differences of the order a + /S. It has par¬ 
ticular solutions of the form ^ where ^ is a root of the equation 

(7) ^ + g = 0. 

The left-hand member for d = 0 is positive and with increasing d de¬ 
creases and attains a minimum when 


pOfi 


a 

a + fi 


and then steadily increases and assumes positive values for large 0 , 
This minimum must be negative or zero because ^ = 1 is a root of (7). 
Now, if it is negative, there are two positive roots of (7). One of them 
is ^ = 1 and another > or < 1 according as 


or else 


P < 


a 

a + 0 


or 


P > 


a 


p0 — qa <0 or >0. 


That is, the positive root of (7) different from 1 is > 1 when single games 
are favorable to B and <1 if they are favorable to A. In case of equita¬ 
ble games, both positive roots coincide and 0 = 1 is a double root of (7). 
All the other roots of (7) are negative or imaginary. 

The regular way to solve the problem would be to write down the 
general solution of (4) involving a 0 arbitrary constants to be deter¬ 
mined by conditions (5) and (6). As this method would lead to a com¬ 
plicated expression for y*, we shall refrain from seeking the exact solution 
of our problem, and instead, following A. A. Markoff^s ingenious remark, 
we shall establish simple lower and upper limits for y, which are close 
enough if the fortunes of the players are large in comparison with their 
stakes. 

Lemma. If y, is a solution of equation (4) and none of the numbers 


Vo, Vi, • . . V«-i 

Va+6, 2/a+6—1, . . . yo+&-^+l 

is negative, then y. ^ 0 /or x = 0, 1, 2, . . . a + 6. 

Proof. Let wi*’ (A; = 0, 1, 2, ... a — 1) represent the probability 
that the player A whose actual fortune is x (and that of his adversary 
a + b — x) will be forced to quit when his fortune becomes exactly = k. 
Evidently is a solution of equation (4) satisfying the conditions 
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=0 for X = 0 , 1 , . . . — 1 , fc + 1 , . . . a — 1 ; a + b, 

a + b - 1, . . . a + b - + 1; ui*> = 1. 

Similarly, if = 0, 1, 2, ... — 1) represents the probability that 

the player B will be forced to quit when the fortune of A becomes exactly 
= a + b — Z, will be a solution of (4) satisfying the conditions 

= 0 for X = 0, 1, 2, . . . a — 1; o + b, . . . a + b — Z + 1, 
a + b — Z— 1, . . . a + b — /3 + 1; = 1- 

Thus we get a + particular solutions of (4), and it is almost evident 
that these solutions are independent. Moreover, since they represent 
probabilities, ^ 0, ^ 0 for a; = 0, 1, 2, ... a + b. Now, any 

solution y, of (4) with given values of 


2 / 0 , 2 / 1 , •• • 2/—1 

Va+bj Va+b-l, . . . ya-KfcHJ+l 

can be represented thus 

a-l fi-l 

Vx = 

lb-0 <-0 

Hence, y, ^ 0 for x = 0 , 1, 2, . . . a + b if none of the numbers 


yo, yi, . . . 2 /«-i 

Va+by ya+ 6 -.l, . . . ya-l-k-^+l 

is negative. This interesting property of the solutions of equation (4) 
derived almost intuitively from the consideration of probabilities can be 
established directly. (See Prob, 9, page 160.) 

The lemma just proved yields almost immediately the following 
proposition: If for any two solutions y' and y'' of equation (4) the 
inequality 

2 /;' ^ 

holds for 

af = 0, 1, 2, . . . a—-Ija + b, a-f-b — 1, . . . a + b — /3+1, 

the same inequality will be true for all a: = 0, 1, 2, ... a + b. It 
suffices to notice that y* = y'' — y' is a solution of the linear equation 
(4) and, by hypothesis, y, ^ 0 for x = 0, 1, 2, ... a — 1 ; a + b, 
a + b— 1 , . . . a + b — /S + l. 

Now we can come back to our problem. First, if the mathematical 
expectation of A 


p0 - qa 
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is different from 0, equation (7) has two positive roots: 1 and 6, With 
arbitrary constants C and D 

2 /' = C + 

is a solution of (4). Whatever C and D may be, y' as a function of x 
varies monotonically. Therefore, if C and D are determined by the 
conditions 

yi = 1 , = 0 

we shall have 

V* ^ 1 if rc = 0, 1, 2, ... a — 1 

y'^0 if a; = a-|-6“/8 + l, ...a + 6 

and by the above established lemma, taking into account conditions (5) 
and ( 6 ), we shall have for the required probability the following inequality 

y. S yL; 

or, substituting the explicit expression for 2 /', 

0a+h-fi-i-l __ 0x 

y* ^ j * 


If, on the contrary, C and D are determined by 
2 /i_i = 1 , = 0 

we shall have 


and 


2 /' ^ 1 if X = 0 , 1 , 2 , ... a — 1 

ylt^O if x = a-f 6 — jS + l, ...0 + 6 


yz ^ 


^+4—<»+l _ 0z —a+1 
a+l _ 


Finally, taking x = a, we obtain the following limits for the initial 
probability yai 




^ ya ^ 


- 1 


^-♦- 6 - 0+1 I 


They give a sufficient approximation to ya if a and h are large com¬ 
pared with a and /9. 

If each single game is equitable, equation (4) has a solution with two 
arbitrary constants: 

yi = C + Dx. 

Proceeding in the same way as before, we obtain the inequalities 

+ ^ <y < _ i _ 

a + 6- |9+l^*'"^o + 6- a + l 
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4. To simplify the analysis, it was supposed that nothing limited the 
number of games played by A and B so that an'eternal game, although 
extremely improbable, was theoretically possible. We now turn to 
problems in which the number of games is limited. 

Problem 3. Players A and B agree to play not more than n games. 
The probabilities of winning a single game are p and q, respectively, and 
the stakes are equal. Taking these stakes as monetary units, the fortune 
of A is measured by the whole number a and that of B is infinite or at 
least so large that he cannot be ruined in n games. What is the proba¬ 
bility for A to be ruined in the course of n games? 

Solution. Let yx.t represent the probability for A to be ruined when 
his fortune is measured by the number x and he cannot play more than 
t games. The reasoning we have used several times shows that yx,t 
satisfies a partial equation in finite differences: 

( 8 ) yx.i = pyx^t.t-i + qyx-i.t-u 

Moreover, if A has no money left, his ruin is certain, which gives the 
condition 

(9) yo,t = 1 if t ^ 0. 

On the other hand, if A still possesses money and cannot play any more, 
his ruin is impossible, so that 

(10) yx.o = 0 if X > 0. 

Conditions (9) and (10) together with equation (8) determine yx,t 
completely for all positive values of x and t. To find an explicit expres¬ 
sion for yx,t we shall use Lagrange^s method. Equation (8) has particular 
solutions of the form 


where a and p satisfy the relation 

a/3 = pa* + q. 

We can solve this equation either for jS or for a which leads to two different 
expressions of Solving for <3 we have infinitely many particular 

solutions 

a*(pa -f- qcT^Y 

with an arbitrary a and we can seek to obtain the required solution in the 
form 




‘(pa -f" qoc ^yf(a)da 
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where /(a) is supposed to be developable in Laurent^s series on a certain 
circle c. To satisfy (10) we must have 


^ ic“* == ® X = 1, 2, 3, . . . 

which shows that /(a) is regular within the circle c. To determine /(a) 
completely, we must have, according to (9) 

fcpa + qcr^y^dct =1 for f = 0, 1, 2, . . . . 

AttiJc a 

All these equations are equivalent to a single equation 
J_ r f(a)da ^ 1 

2iriJcCt — pea* — q€ 1 — € 

holding good for all sufficiently small e. The integrand has a single pole 
ao within c defined by 

ao — peal — ?€ = 0, 
and the corresponding residue is 


But this must be equal to 


- - pa‘o ■' 


1 - € 


or, substituting for « its expression in ao 

q + paj 
pal — ao + ?’ 

and hence for all sufficiently small ao 


that is, if 


/(«o) = 


poj 


pal — ao + g’ 


/(«) = 


q - pa* 

pa* — a + g 


all the requirements are satisfied. Taking into account that p + q 
we have 


/(<*) = 



1 . 


/(a) = 1 + > 1 + 


2[- 


and also 



SBC. 4] FURTHER CONSIDERATIONS ON GAMES OF CHANCE 149 


The expression for y*,* is therefore 

yx,t = 

n-O 

where Co = 1 and Cn = 1 + (p/?)" if n ^ 1. 

It remains to find the coefficient of 1/a in the development of the 
integrand in a series of descending powers of a. Since 

t 

a^-^(pa + qct-^y = 

Z-0 

this coefficient is given by the sum 

t-x 

2 

extended over all integers I from 0 up to the greatest integer not exceeding 

t ”■ X 

2 ~ • Hence, the final expression for the probability j/a.n is 

n — a 
2 

(11) = g” Cj,(pg)'[p"-»-** + g»-«-*'] 

t = 0 

with the agreement, in case of an even n — a, to replace the sum 

po ^ qo 

corresponding to I = —^natural that the right-hand 

member of the preceding expression should be replaced by 0 if n < a, 
which is in perfect agreement with the feet that A cannot be ruined in less 
than a games. 

The second form of solution is obtained if we express a as a function of 
p. The equation 

pa* — ap + q = 0 
having two roots, we shall take for a the root 

_ g - yp* - 4pg 

“ 2p 

determined by the condition that it vanishes for infinitely large positive 
P and can be developed in power series of l/P when \p\ > 2\/pg. Using 
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a in this perfectly determined sense, it is easy to verify that 

where c is a circle of radius > 1 described from 0 as its center, satisfies all 
the requirements. For it is a solution of equation (8). Next, for x = 0 
and i ^ 0, 


--^X«'0+?+ ■)*■* 


and, finally, for i = 0 and x > 0 

_ 1 ((^ - VP* - 4pg' 

*•" 2«JA' 




because the development of the integrand into power series of 1/^3 
starts at least with the second power of l/fi. 

To find t/*.< in explicit form, it remains to find the coefficient of 1/p 
in the development of 

(P - - ipqV 

\ 2p J P -1 
in a series of descending powers of p. Let 

^ t . isii . .... 

V 2p / |8* ’’’ j8*+‘ ’’’ ’ 

multiplying this series by 

we find that the coefficient of 1/P in the product is 

lx + i*+i + • * • + L, 

and hence 

yx,t = lx + 1^1 + • • • + if 

provided i ^ x, for otherwise = 0. The quadratic equation in a 
can be written in the form 

« = |(9 + P“*) 

and the development of any power of its root vanishing for /3 = «> into 
power series of 1/P can be obtained by application of Lagrange’s series. 
We have 


^ px+i -r 


+J+ 


2 x0-’‘[ d“-‘(9 + p{’)"f*"*l 

nl L Jt-o’ 
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ir d^-Kq + ^ (x + 2i - 1)! ^,^ . 

nlL J{«o i!(x4-i)! ^ ^ 

if n = X + 2t, and = 0 if n == x + 2i + 1. Hence, 

x(x + 2i — 1)1 


I _ 

ti4-2i —- 


i!(x + i )! 

lx-^2i+l = 0, 




and finally 

( 12 ) j/„,» = «r[i + -“ 


a(a + 3), , a{a + 4)(a + 5), 

pq + -• T T o— (P^) + —rr^^To— -(P^) + 


+ 


1 2 

ci(a -}■ “h 1) 


+ 


1 ^-S 
• • (a + 2A; — 1) 


1-2 


(pqy 


where k = - or k = - -^-- according as n and a are of the 

same parity or not. 

6. The difference t/a.n — 2/«.n~i gives the probability for the player A 
to be ruined at exactly the nth game and not before. Now, this differ¬ 
ence is 0 if n differs from a by an odd number, so that the probability of 
ruin at the (a + — l)st game is 0 . That is almost evident because 

after every game the fortune of A is increased or diminished by 1 and 
therefore can be reduced to 0 only if the number of games played is of 
the same parity as a. If n = a + 2i, the difference i/o.n — ya,n-i is 

+ i + '^) • • • {a + 2i - 

1 • 2 ■ 3 • ■ • t ^ ^ ■ 


Such, therefore, is the probability for A to be ruined at exactly the 
(a + 2i)th game. The remarkable simplicity of this expression obtained 
by means which are not quite elementary leads to a suspicion that it 
might also be obtained in a simple way. And, indeed, there is a simple 
way to arrive at this expression and thus to have a third, elementary, 
solution of Prob. 3. 

Considering the possible results of a series of a + 2i games, let A 
stand for a game won by A, and B for a game lost by A. The result of 
every series will thus be represented by a succession of letters A and jK. 
We are interested in finding all the sequences which ruin A at exactly 
the last game. Because the fortune of A sinks from a to 0 there must be 
i letters A and i + a letters B in every sequence we consider. Besides, 
there is another important condition. Let us imagine that the sequence 
is divided into two arbitrary parts, one containing the first letter and 
another the last letter of the sequence. Let x be the number of letters B, 
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and y that of letters A in the second or right part of the sequence. There 
will be a + i — x letters B and t — y letters A in the first or left part. 
It means that the fortune of A after a game corresponding to the last 
letter in the left part, becomes 

a + t — y — (a + i — a;)=a: — y 

and since A cannot be ruined before the (a + 2i)th game, x must always 
be >y. That is, counting letters A and B from the right end of the 
sequence, the number of letters B must surpass the number of letters A 
at every stage. Conversely, if this condition is satisfied the succession 
represents a series of games resulting in the ruin of A at the end of the 
series and not before. 

To find directly the number of sequences satisfying this requirement 
is not so easy, and it is much easier, following an ingenious method 
proposed by D. Andr4, to find the number of all the remaining sequences 
of i letters A and i + a letters B. These can be divided into two classes: 
those ending with A and those ending with B, Now, it is easy to show 
that there exists a one-to-one correspondence between successions of these 
two classes, so that both classes contain the same number of sequences. 
For, in a sequence of the second class (ending with B) starting from 
the right end, we necessarily find a shortest group of letters containing 
A and B in equal numbers. This group must end with A, Writing 
letters of this group in reverse order without changing the preceding 
letters, we obtain a sequence of the first class ending with A. Con¬ 
versely, in a sequence of the first class there is a shortest group at the 
right end ending with B and containing an equal number of letters A and 
B, Writing letters of this group in reverse order, we obtain a sequence 
of the second class. 

An example will illustrate the described manner of establishing the 
one-to-one correspondence between sequences of the first and of the 
second class. Consider a sequence of the first kind 

B\BBABAA. 

The vertical bar separates the shortest group from the right containing 
letters A and B in equal numbers. Reversing the order of letters in this 
group, we obtain a sequence of the second class 

B\AABABB 

and this sequence, by application of the above rule, is transformed again 
into the original sequence of the first class. The number of sequences 
of the first class can now be easily found. It is the same as the number of 
all possible sequences of f 1 letters A and a + i letters B, that is, 

(a -h 2i - 1)! _(a + i + l)(a + i + 2) • • • (a + 2i - 1) 

(t - l)!(a + i)\ “ 1.2 ... (t - 1) 
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The total number of sequences in both classes is 

-b i “h l)(o “h I -h 2) • • • (o + 2i — 1) 

1 • 2 • - • (t - 1) 

Hence, the number of sequences leading to ruin of A in exactly a -f- 2i 
games is 

(a + i 4- l)(o + 1 + 2) • • • (a + 2t) 

1 • 2 • • t 

^(a + i + l)(a + t + 2) * * * (a + 2i — 1 ) _ 

1 • 2 • • • (t - 1 ) 

_ a(a + 1 + 1 ) • • • (a + 2t — 1 ) 

1 • 2 • • i 

As the probability of gains and losses indicated by every such sequence 
is the same, namely, the probability of the ruin of A in exactly 

a + 2i games is 

a(a + t + 1) • • • (o + 2i - 1 ) ..m < 

1 • 2 ■ 3 • ■ • t “ ^ 


and hence the second expression found for ya,n follows immediately. 

The problem concerning the probability of ruin in the course of a 
prescribed number of games for a player playing against an infinitely 
rich adversary was first considered by de Moivre, who gave both the 
preceding solutions without proof; it was later solved completely by 
Lagrange and Laplace. The elementary treatment can be found in 
Bertrand's ‘‘Calcul des probabilit^s." 

6. Formulas (11) and (12), though elegant and useful when n is not 
large, become impracticable when n is somewhat large, and that is pre¬ 
cisely the most interesting case. Since the question of the risk of ruin 
incurred in playing equitable games possesses special interest, it would not 
be out of place at least to indicate here, though without proof, a con¬ 
venient approximate expression for the probability ya,n in case of a large 
n and p == q = Let t be defined by 


\/2(n + iY 

then for n ^ 50 it is possible to establish the approximate formula 






6n 


where — 1 < 0 < 1. Suppose, for instance, that the fortune of a player 
amounts to $100, each stake being $1, and he decides to play 1,000, 



164 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. VIII 

5,000, 10,000, 100,000, 1,000,000 games. Corresponding to these cases, 
we find 

t = 2.2354, 0.9999, 0.7071, 0.2236, 0.0707 

and hence 

Ce-'dz = 0.9984, 0.8427, 0.6827, 0.2482, 0.0796. 

V^Jo 

The corresponding approximate values of i/ioo.n are 

0.0016, 0.1573, 0.3173, 0.7518, 0.9204. 

Thus, for a player possessing $100 there is very little risk of being ruined 
in the course of 1,000 games even if he stakes $1 at each game. The risk 
is considerably larger, but still fairly small, when 5,000 games are played. 
In 10,000 games we can bet 2 to 1 that the player will still be able to 
continue. But when the limit set for the number of games becomes 
100,000, we can bet 3 to 1 that the player will be ruined somewhere in the 
course of those 100,000 games. Finally, there is little chance to escape 
ruin in a series of 1,000,000 games. The risk of ruin naturally increases 
with the number of games, but not so fast as might appear at first sight. 

7. We conclude this chapter by solving the following problem, 
where the fortunes of both players are finite. 

Problem 4. Players A and B agree to play not more than n games, 
the probabilities of winning a single game being p and g, respectively. 
Assuming that the fortunes of A and B amount to a and 6 single stakes 
which are equal for both, find the probability for A to be ruined in the 
course of n games. 

Solution. Let be the probability for the player A to be ruined 
when his fortune is x (and that of his adversary a + h — x) and he can 


play only t games. 

Evidently Zx.t satisfies the equation 

(13) 

Zx.t = p2x+l.<-l + QZx-l.t-l 

perfectly similar to 
serving to determine 

equation (8), but the complementary conditions 
Zx.t completely are different. First we have 

(14) 

Zo,t = 1 for f ^ 0. 

Next, 


(15) 

2 o+ 5 ,« = 0 for ^ ^ 0, 


because if A gets all the money from /^, the games stop and A cannot b(; 
ruined. Finally, 

*fl6) ^*,0 ~ 0 for ic = 1, 2, 3, . « • a ^ — 1* 
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because A, having money left at the end of play, naturally cannot be 
ruined. 

Since (13) has two series of particular solutions 

and 

where a and a' are roots of the equation 


pa^ — fia Ar Q — 0 

both developable into series of descending powers of for \P\ > 1, we 
shall seek Zx.t in the form 






Here the integration is made along a circle of sufficiently large radius and 
/(d) and ^(d) are two unknown functions which can be developed into 
series of descending powers of d- Obviously satisfies (13) identically 
in X and t. For a; = 0 and ( ^ 0 we have the condition 


-f 

2 TiJc 


2Tri 

which is satisfied if 


[/(d) + ^(d)ldVfd = 1; ^ = 0,1, 2, 


( 17 ) m + m 


Condition (15) will be satisfied if 

(18) a“ + Y(d) + a'‘*+V(3) = 0 

and it remains to show that at the same time (16) is satisfied. Solving 
(17) and (18), we have 

^'o +5 1 

” “'a+6 _ ^atb I 

— ^ 1 

= a'«+‘ - „“+* ’ 

and 


(19) /(|8)a* + v.G8)a'* 


a'«+‘a' — a°+‘a'* _ 

Or^r)(a'<-+* - a-*+‘) “ 

= /gV 

\P/ (^ ~ l)(a'®'^^ — 


Now let a be the root vanishing for d = and a the other root whose 
development in series of descending powers of d starts with the term 
containing d- Evidently the development of (19) for 


X = 1, 2, 3, . . . a -h 6 — 1 
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does not contain terms involving the first power of and hence 
«,.o = 0 if X = 1, 2, 3, . . . a + 6 — 1 as it should be. The solution 
of (13) satisfying (14), (15), (16) being unique, its analytical expression is 
therefore 



whence for x = a and i = n 



a'* - a* 

a'a+b _ ^+b fi ^ I 


To find an explicit expression for Za,n it remains to find the coefficient of 
1/fi in the development of 

p = /g V fi- 

\p/ — 1 


in series of descending powers of /3. This can be done in two different 
ways. First we can substitute for a' its expression in a: 


OL — —OL 


and present P in the form 


' ' /3 - l' 




2a+2b 


or developing into series 

But the coefficient of 1//3 in 


,*.+M 


+ 


1 /»• 


1 

by the second solution of Prob. 3 is the probability ym,n for a player with 
a fortune m to be ruined by an infinitely rich player in the course of n 
games. Hence, the final expression for 2a.n is 






Va+».» + I j 


a+b 




o+2b 


I^8»f46.fi + 
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the terms of this series being alternately of the form 


yiik-¥l)a-k-2kb,n 


y(2Jfc+l)a+(21k+2)6.» 


for = 0, 1, 2, ... . The series stops by itself as soon as the first 
subscript of becomes greater than n. 

To obtain a second expression of Za,n we notice that 


__ a' - g ‘ g' - g 


R 


is a rational function of /3 whose denominator 




is a polynomial in /3 of the degree a + 6 — 1. To find the roots of = 0, 
we set P = 2\/^ cos (p. Since, then, 


we have 


The equation 


--yli 


R==l^ 

\P 


^ Y . ± _^ sin (g + h)ip 


sin (a 4- h)(p 


having roots 


iPk = /i == 1, 2, . . . a -f 6 — 1, 


the 0 + 6 — 1 roots of R are 

Ph = 2\/^ cos (pk. 

Now we can resolve the rational function P into a sum of simple elements 
as follows: 




q*(p* - g") 

T>®+* — 
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and for A > 0 




sin ipK 


(a + h){l - 2\/^ cos <ph) 


sin av3fc(cos 


while JS^OS) is the integral part of P. The coefficient of 1//3 in the develop¬ 
ment of P being 


0 + 6-1 

Ao + A 

6-1 


h, 


we have a new explicit expression for Za.n. 


( 20 ) 


pO+6 _ gO+6 


00 + 6 — 1 

a + h 


sin 


irh 

o -|“ 6 


1 - 2v^ 


cos 


irh 

0 + 5 


sin 


vah 


0 + 5' 




This expression shows clearly that Za,n, with increasing n, approaches 
the limit 

r(p* - ?*) 

p.+b _ ga+b 

representing the probability of ruin when the number of games is unlim¬ 
ited, in complete accord with the solution of Prob. 1. 

The first term in (20) naturally must be replaced by —in case 

O "j“ 0 

P = ^ This form of solution was given first by Lagrange. 


Problems for Solution 

1. Players A and B with fortunes of $50 and $100, respectively, agree to play until 

one of them is ruined. The probabilities of winning a single game arc % and 
respectively, for A and and they stake $1 at each game. What is the probability 
of ruin for the player A? Ana. Very nearly 2“*® — S.SS-IO"**. 

2. If A and B at each single game stake $3 and $2, respectively, and have fortunes 
of $30 and $20 at the beginning, what is the approximate value of the probability 
that A will be ruined if the probability of his winning a single game is (a) p = Hi 
(5) p * 

Ana. (a) 0.40 + A; |Al < 1.7 X 10"*; (5) 0.96 + A; \A\ < 4.6 X lO”*. 

3. A player A with the fortune $a plays an unlimited number of games against an 
inhnitely rich adversary with the probability p of winning a single game. He stakes 
SI at each game, while his rich adversary risks staking such a sum ^ as to make the 
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game favorable to A. What is the probability that A will be ruined in the course 
of the games? Give numerical results if (o) o = 10, p = 3^, /J *= 3; (6) a * 100, 
p == * 3. Ans. Let < 1 be a positive root of the equation — 0 ^ Q - 0. 

The required probability P is: P = 6^. 

In case (a) P * 0.002257; in case (6) P = 3.43 • 10“”. 

4 . A player A whose fortune is $10 agrees to play not more than 20 games against 
an infinitely rich adversary, both staking $1 with an equal probability of winning a 
single game. What is the probability that A will not be ruined in the course of 
20 games? Ana. 0.9734. 

6. Players A and B with $1 and $2, respectively, agree to play not more than n 
equitable games, staking $1 at each game. What are the probabilities of their ruin? 

. r. ^2 3 + (-!)« , „ 1 3 - (-!)• 

Ana. For A: --— ^ ; for B: -- • — — 

3 3 • 2*+» 3 3 • 2»+» 

6. Players A and B with $2 and $3, respectively, play a series of equitable games, 
both staking $1 at each game. What are the probabilities of their ruin in n games? 
Give the numerical result if n = 20. Ana. 




= 1 if n is odd. 


=2 if n is even. 


17 =* 1 if n is even, 17» 2 if n is odd. 


7 . Find the expression of j/a.m the probability of the ruin of A when his adversary 
B is infinitely rich, corresponding to formula ( 20 ). Ana. From the definition of a 
definite integral it follows that 


Pa.n — !/o.«o 


\P/ I Sin ip sm aip , _ ^ 

- I -=-(cos 

V Jo 1 ~ 2 V PQ cos ip 


where 


y,.« = 1 if p ^ 9 
“ (p) 


If the games are equitable and n differs from a by an even number, then 


y«.» 


2 P 2 Pn 
rjo si: 


pm (Up 

-1 

sin ip 


(cos ip) ^'dip. 


This formula was given by Laplace. 

8. Referring to the last formula in the preceding problem, show that 


ya,H 




'du -|- A 


t 


a 

V2(n + I)’ 




y«n 

32 . 


where 
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Indicaiion of the Proof. It is important to prove the following inequalities first 


Ip (cos 
sin ip 


for 0 < ^ ^ - 
2 


^(cos 


n4-i . (n-H)»4 


0 <». g- 


» (cos 




0 <e <i 


provided 0 < ^ ^ t/4. The rest of the proof is easy. 

9 . Attempt a direct proof of the important lemma (page 144) used in the discus¬ 
sion of Prob. 2. 

Hint: The proof can be based upon the following proposition^ generalizing an 
important theorem on determinants due to Minkowski: Let 


fi = ciuXi + auXi “h 


“h OtniXn] i — 1, 2, 3| . 


be a system of linear forms whose coefficients satisfy the following conditions: 

(1) an > 0; Oki ^ 0 if k 9^ i; an au • • • -f a»» ^ 0. 

(2) One of these sums is positive. 

If these forms assume nonnegative values, then every Xi ^ 0(i = 1, 2, . . 
Proof by induction: Express Xn through Xn xtt . . . Xn-u thus: 


fn UinXi atnXt 


* On—l,n35n—I 


and substitute into the remaining forms. Show that the resulting forms in xn zi, 
. . . Xn-i satisfy the same conditions (1) and (2). Hence, it remains to prove the 
proposition for two forms, which can easily be done. 


References 

Chr. Huygens: “De ratiociniis in ludo aleae,” 1664. 

De Moivre: “Doctrine of Chances,” 3d ed., 1756. 

Lagrange: ^^moire sur les suites r4currentes dont les termes varient de plusieurs 
mani4res dii!4rentes, etc.. Oeuvres IV, pp. 161^. 

Laplace: “Thdorie analytique des probability,” Oeuvres VII, pp. 22S-242. 
Bertrand: “Calcul des probability,” pp. 104-141, Paris, 1889. 

Markoff: “ Wahrscheinlichkeitsrechnung,” pp. 142-146, Leipzig, 1912. 

‘ The author is indebted to Professor Besikovitch of Cambridge, England, for the 
communication of this direct proof. 



CHAPTER IX 


MATHEMATICAL EXPECTATION 

1 . Bernoulli’s theorem, important though it is, is but the first link 
in a chain of theorems of the same character, all contained in an extremely 
general proposition with which we shall deal in the next chapter. But 
b efore proceeding to this ta sk, it is nece ss ary to extend the definition of 
‘ ‘ mathemat ical- pxppctatiion”—ah Important co ncept originating^ in 
rnnnefitiorNwifJli ^rnfta of nhan cft. 

T^,“according^5*^^e conditions of the game, the player can win a 
sum a with probabilitjrW and lose a sum h with probability 5 ^ = 1 — p, 
the mathematical expectaHpn of his gain is by definition 

pa — qh. 


Considering the loss as a negative gain, we may say that the gain of the 
player may have only two values, a and — 6 , with the corresponding 
probabilities p and 5 , so that the expectation of his gain is the sum of the 
products of two possible values of the gain by their probabilities. In this 
case, the gain appears as a variable quantity possessing two values. 

Variable quantities with a definite range of values each one of which, 
depending on chance, can be attained with a definite probability, are 
called “chance variables,” or, using a Greek term, “stochastic” variables. 
They play an important part in the theory of probability. Ajj^chastic 
variable is defined (a) if the set of its possible values is given, and (b) if 
the probaKlity tb'attaiirea^ particular value is also giyej^" 

K is^ ^y^o ^ve examples of stochastic variables. yThe gain in a 
game of chance is a stochastic variable with two values. The number of 
points on a die that is tossed, is a stochastic variable with six values, 
1 , 2 , ... 6 , each of which has the same probability J^.^^Tnumber on 
a ticket drawn from an urn containing 20 tickets numbered^om 1 to 20 , 
is a stochastic variable with 20 values, and the probability to attain 
any one of them is Each of two urns contains 2 white and 2 black 

balls. Simultaneously, one ball is transferred from the first urn into the 
second, while one ball from the latter is.transferred into the first. After 
this exchange, the nulnber of white balls in one of the urns may be regarded 
as a stochastic variable with, three values, 1 , 2 , 3^ whose corresponding 
probabilities are, respectively, It is natural to extend the 

concept of mathematical expectation to stochastic variables in general, 


101 




>( 
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Suppose that a stochastic variable x possesses n values: 

®1| • • • Xnf 

and 

PU Pit •• • Pn 

denote the respective probabilities for x to assume values Xi, Xs, . . . Xn» 
By definition the mathematical expectation of x is 

E(x) = piXi + PiXi + * • * + pnXn» 

It is understood in this definition that the possible values of the 
variable x are numerically different. For instance, if the variable is a 
number of points on a die, its numerically different values are 1, 2, 3, 4, 5, 
6, each having thoeame probability, 3-^. By definition, the mathematical 
expectation of the number of points on a die is 

J(l + 2 + 3 + 4 + 5 + 6)= 3.5. 

If the variable is the number on a ticket drawn from an urn containing 
20 tickets numbered from 1 to 20, its numerically different values are 
represented by numbers from 1 to 20, and the probability of each of 
these values is so that the mathematical expectation of the number 
on a ticket is 


*(1 + 2 + • • • + 20) « 10.5. 

2. It is obvious that the computation of mathematical expectation 
requires only the knowledge of the numerically different values of the 
variables with their respective probabilities. But in some cases this 
computation is greatly simplified by extending the definition of mathe¬ 
matical expectation. Suppose that, corresponding to mutually exclusive 
and exhaustive cases Ai, yl 2 , . . . Am, the variable x assumes the values 
Xi, X 2 , . . . Xmj with the corresponding probabilities pi, p 2 , . . . p«; 
we can define the mathematical expectation of x by 

= Pl^l + Pi^i + * ' * + PmXm- 

What distinguishes this extended definition from the original one is that 
in the second definition the values xi, X 2 , . . . Xm need not be numerically 
different; the only condition is that they are determined by mutually 
exclusive and exhaustive ca.ses. 

To make this distinction clear, suppose that the variable x is the 
number of points on two dice. Numerically different values of this 
variable are 

2, 3, 4, s/e, 7, 8, 9,' 10, 11, 12 
and their respective probabilities 

^ I 1^1 t t f • 
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Therefore, by original definition, the expectation of x is 

A -f A + if + « + « + « + + I* + U + iH- H = W = 7. 

But we can distinguish 36 exhaustive and mutually Exclusive cases accord^ 
ing to the number of points on each die and, correspondingly, 36 values 
of the variable x, as shown in the following table: 


Firat die 

Second die 

X 

First die 

Second die 

X 

1 

1 

2 

4 

1 

5 

1 

2 

3 

4 

2 

6 . 

1 

3 

4 

4 

3 

7 

1 

4 

5 

4 

4 

8 

1 

5 

6 

4 

5 

9 

1 

6 

7l— 

4 

6 

10 


1 

3, 

5 

1 

6, 

2 ! 

2 

4 

5 

2 

7. 

2 

3 

5 ' 

5 

3 

8 

2 

4 

6 v 

5 

4 

9 

2 

5 

7 

5 

5 

10 

2 1 

6 

8^ 

5 

6 

11 

3 

1 

4- 

6 

1 


3 

2 

5 

6 

2 


3 

3 

6* 

6 

3 


3 

4 

7 

6 

4 


3 

5 

8. 

6 

5 

11 

3 

6 

9 

6 

6 

12 


The probability of each of these 36 cases being 3 ^ 35 , by the extended 
definition the mathematical expectation of x is 

2 + 2-3 + 4*34-5-4-f65-f7-64-8*5-f9-4-hl0-3-hll-2 + 12 

= 7 


as it should be. 

It is important to show that both definitions always give the same 
value for the mathematical expectation. 

Let Xi, X 2 , . . . Xm be the values of the variable x corresponding 
to mutually exclusive and exhaustive cases Ai, ^ 2 , • • . Am, and, 
Pi, Pi, . . . Pm, their respective probabilities. By the extended defini¬ 
tion of mathematical expectation, we have 


(1) 


E{X) = PiXi + P 2 X 2 + • • * + PmXm. 
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The values Xi, X 2 , . . . Xm are not necessarily numerically different, 
the numerically different values being 

r;, r, . . . X. 

We can suppose that the notation is chosen in such a way that 

x\j xj, . . . Xa are equal to {; 

Xa+ij Xa-f. 2 , . . . Xft are equal to 17 ; 

X 641 , X 6 + 2 , . . . Xe are equal to f; 


Xi 4 i, Xz+ 2 , . . . Xm are equal to X. 

Hence, the right-hand member of (1) can be represented thus: 

(pi + Pi + * * • +Pa){ + (Pa4l + Pa+2 + * * ‘ + Vh)^ + * * • + 

+ (PZ+1 + Pi+2 "h ‘ + Pm)X. 

But by the theorem of total probabilities, the sum 
Pi 4- P2 + * • * + Pa 

represents the probability P for the variable x to assume a determined 
value because this can happen in a mutually exclusive ways; namely, 
when X = Xi, or X = X 2 , . . . or x = Xa. By a similar argument we see 
that the sums 

Pa+l 4 Pa+2 4 • • • 4 P6 
P6+1 4 P 5+2 4 * • * 4 Pc 


Pl+l 4 Pl+2 4 * • • 4 Pm 

represent the probabilities Q, i 2 , . . . T for the variable x to assume 
values 17 , f, ... X. Therefore, the right-hand member of (1) reduces 
to the sum 

Pi + Qv + Ri + — ' + T\ 

which, by the original definition, is the mathematical expectation of x. 

If, corresponding to mutually exclusive and exhaustive cases, a 
variable x assumes the same value a —in other words, remains constant— 
it is almost evident that its mathematical expectation is a, because the 
sum of the probabilities of mutually exclusive and exhaustive cases is 1 . 

It is also evident that the expectation of ax where a is a constant, is 
^qual to a times the expectation of x. 

Note: Very often the mathematical expectation of a stochastic variable is called ^ 
its “mean value.” 

Mathematical Expectation op a Sum 
3. In many cases the computation of mathematical expectation is 
greatly facilitated by means of the following very general theorem; 
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I Theorem. The mathematical expectation of the sum of several variables 
equal to the sum of their expectations; or^ in symbols^ 

Eix + y + z + • • • + tw) = E(x) + E{y) + E{z) + • • • + E(w). 

Proof. We shall prove this theorem first in the case of a sum of two 
variables. Let x assume numerically different values Xi, X 2 , . . . 
while numerically different values of y are 2/1, 2/2, . • • Vn . In regard to 
the sum x + y we can distinguish mn mutually exclusive cases; namely, 
when X assumes a definite value x» and y another definite value 2 /,, while i 
andj range respectively over numbers 1, 2, 3, . . . m and 1, 2, 3, . . . n. 
If Pa denotes the probability of coexistence of the equalities 


X = Xi, y = yi 

we have by the extended definition of mathematical expectation 


E(x + y) = + Pi), 


.-ly-i 


or 

( 2 ) 


X m n 

+ j/) f X X XX 






As the variable x assumes a definite value x* in n mutually exclusive 
ways (namely, when the value x» of x is accompanied by the values 
Vh 2/2, •• • 2 /n of y) it is obvious that the sum 

Xp*>. 

i-i 

represents the probability of the equality x = x<. In a similar manner 
we see that the sum 




t=i 


represents the probability qj of the equality y - yj. Therefore 

m n m n m 

X = x®' Xp’' “ 






\nd similarly 


X Xp‘'^' “ X Xp»vi = X^'^' “ 

i-lt-1 j-1 
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that is, by (2) 

E{x + y) = E{x) + E{y) 

which proves the theorem for the sum of two variables. 

If we deal with the sum of three variables a: + y + we may consider 
it at first as the sum of x + y and z and, applying the foregoing result, 
we get 

E{x + y -f 2 ) = E{x + y) + E{z ); 

and again, by substituting E(x) + E(y) for E(x + y), 

E(x + y + z) = E{x) -h E{y) + E[z). 

In a similar way we may proceed farther and prove the theorem for the 
sum of any number of variables. 

4, The theorem concerning mathematical expectation of sums, 
simple though it is, is of fundamental importance on account of its very 
general nature and will be used frequently. At present, we shall use it 
in the solution of a few selected problems. 

Problem 1. What is the mathematical expectation of the sum of 
points on n dice? 

Solution. Denoting by Xi the number of points on the tth die, the 
sum of the points on n dice will be 



"f* 3/n, 

• + E{Xn). 


« = Xi + X2 + • 

and by the preceding theorem 

E{b) = E(xx) + E{x^) + 

But for every single die 

E{xi) = ^; i = 1, 2, ... to; 

E{s) = ^. ^ 


therefore 



Problem 2. What is the mathematical expectation of the number of 
successes in n trials with constant probability p? 

Solution. Suppose that we attach to every trial a variable which 
has the value 1 in case of a success and the value 0 in case of failure. If 
the variables attached to trials 1, 2, 3, ... n are denoted by Xi, X 2 , . . • 
Xn, their sum 


m = Xi +^X2 + • • • + Xn 

obviously gives the number of successes in n trials. Therefore, the 
required expectation is 

E{m) « E(xx) + E{xt) 4 - . . . + E{xn). 
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But for every i * 1, 2, 3, ... n 

= p • 1 + (1 - p) • 0 = p, 

because x* may have values 1 and 0 with the probabilities p and 1 — p 
which are the same as the probabilities of a success or a failure in the tth 
trial. Hence, 

E(jn) = np 
or 

E{m — np) = 0, 

which may also be written in the form 

n 

T^{m — np) = 0. 

m —0 

This result was obtained on page 116 in a totally different and more 
complicated way. The new deduction is preferable in that it is more 
elementary and can easily be extended to more complicated cases, as 
we shall see in the next problem. 

Problem 3. Suppose that we have a series of n trials independent or 
not, the probability of an event being p* in the ith trial when nothing is 
known about the results of other trials. What is the mathematical 
expectation of the number of successes m in n trials? 

Solution. Again let us introduce the variable connected with 
the ith trial in such a way that x< = 1 when the trial results in a success 
and x< = 0 when it results in failure. Obviously, 

m = Xi 4- X2 + • • • + Xn 
and 

Eim) = E(xi) + E(x 2 ) + • • • -f E(xn). 

But 

EiXi) = 1 • p< + 0 • (1 - p<) = Pi 

and therefore 

E{m) = pi + P2 + * * * + Pn. 

For instance, if we have 5 urns containing 1 white, 9 black; 2 white, 
8 black; 3 white, 7 black; 4 white, 6 black; 6 white, 5 black balls, and we 
draw one ball out of every urn, the mathematical expectation of the 
number of white balls taken will be: 

E{rn) = + + A + + 

Problem 4 . An urn contains a white and h black balls, and c balls are 
drawn. What is the mathematical expectation of the number of the 
white balls drawn? 
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Solution. To every ball taken we attach a variable which has the 
value 1 if the extracted ball is white, and the value 0 otherwise. The 
number of white balls drawn will then be 


« = ail + a;2 + * • • + 

But the probability that the tth ball removed will be white when nothing 
is known of the other balls is —therefore 

Cl -r 0 


^(®i) = 


1 + 


a + 6 a + 6 
for every t, and the required expectation is 


o 

0 + 6 


E{s) = 


ca 

0 + 6 


Problem 6. An urn contains n tickets numbered from 1 to n, and 
m tickets are drawn at a time. What is the mathematical expectation 
of the sum of numbers on the tickets drawn? 

Solution. Suppose that m tickets drawn from the urn are disposed 
in a certain order, and a variable is attached to every ticket expressing 
its number. Denoting the variable attached to the tth ticket by 
the sum of the numbers on all m tickets apparently is 

« = + Xa + • • • + 

But when taken singly, the variable Xi may represent any of the numbers 
1, 2, 3, . . . n, the probability of its being equal to any one of these 
numbers being 1/n. By the definition of mathematical expectation, we 
have 

ET/ X l + 2 + 3+ **-+n n + 1 

- -T-’ 

and therefore 


For example, taking the French lottery where n = 90 and m == 5, we 
find for the mathematical expectation of the sum of numbers on all 5 
tickets 

E(s) = = 227.5. 


Problem 6. An urn contains n tickets numbered from 1 to n. These 
tickets are drawn one by one, so that a certain number appears in the 
first place, another number in the second place, and so on. We shall say 
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that there is a “coincidence’* when the number on a ticket corresponds 
to the place it occupies. For instance, there is a coincidence when the 
first ticket has number 1 or the second ticket has number 2 , etc. Find 
the mathematical expectation of the number of coincidences. Also, find 
the probability that there will be none, or one, or two, etc., coincidences. 

Solution. Let Xi denote a variable which has the value 1 if there is 
coincidence in the ith place, otherwise xi = 0. The sum 

s = rEi + Xa + • * * + aJn 

gives the total number of coincidences and 

Eis) = E{x\) + E{x^ 4- * • * + E{xn). 

But 

Eixi) = 1 • 1 = i 
n n 

because the probability of drawing a ticket with the number i in the tth 
place without any regard to other tickets obviously is 1 /n; therefore, 

E(s) = n • i = 1. 
n 

On the other hand, denoting the probability of exactly i coincidences by 
p<, we have by definition 

E{s) = Pi + 2 p 2 4- ' ' • 4- npn, 

and, comparing with the preceding result, we obtain 

(3) Pi 4" 2p2 4- * * • 4- nj>n — 1. 

Let us denote by ^(n) the probability that in drawing n tickets, we shall 
have no coincidences. It is easy to express p* by means of ^(n — t). 
In fact, we have exactly i coincidences in 

^ n(n - 1) • ■ • (n - i 4- 1) 

1.2 • 3 • • • f 

mutually exclusive cases; namely, when the tickets of one of the 

c; 

specified groups of t tickets have numbers corresponding to their places 
while the remaining n — f tickets do not present coincidences at all. 
By the theorem of compound probability, the probability of i coincidences 
in i specified places is 

1 1 ^ ^ ^ 1 

nn — 1 n — i4-l 
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and the probability of the absence of coincidences in the remaining n ■— i 
places is ip{n — i). The probability of exactly i coincidences in i specified 
places is therefore 


- i) 

n(n — 1) • • • (n — i + 1)^ 

and the total probability p, of exactly i coincidences without specification 
of places is 


Pi 


n(n — 1) • • • (n — i + 1) _ y(n — i) _ 

1 • 2 • 3 • • * 1 n(n — 1) * • • (n — t + 1)^ 


or 

(4) 


__ ip(n - i) 

P' 1 • 2 • 3 • • • »■ 


The symbol ¥>(0) has no meaning, but the preceding formula holds 
good even for t = n if we assume ^(0) = 1. 

Substituting expression (4) for pi into (3), we reach the relation 


<p(n - 1) + 


ipjn — 2) »(n — 3) 


1 ! 


+ 


2 ! 


+ .. + .yW = 1- 

^ ^ (n - 1)1 


or changing n into n + 1 




21 


+ 


nl 


1 , 


which gives successively v>(l), <p{2), <p{3), ... by taking 
n = 1, 2, 3, . . . . 

The general result, which can easily be verified, is 


.(») = 


*-0 

or, in an explicit form, 

^(n) = 1 - 1 + ^ - + 

Even for moderate n this is very near to 

1 


(~ 1 )" 


1 = 1 ^ 1 ^—I - 

e 1^1*2 1-2-3 


4- 


1 


ad inf, = 0.36787944. 


Mathematical Expectation of a Product 
5. For the product of two or more stochastic variables we do not 
possess anything so general as the foregoing theorem concerning the 
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mathematical expectation of sums. An analogous theorem with respect 
to the product of stochastic variables can be established only under 
certain restrictive conditions. 

Several stochastic variables are called ‘‘independent” if the proba¬ 
bility for any one of them to assume a determined value does not depend 
on the values assumed by the remaining variables. For instance, if the 
variables are the numbers of points on dice, they may be considered as 
independent. 

On the other hand, we have a case of dependent variables in numbers 
on tickets drawn in a lottery. For, in this case the fact that certain 
tickets have determined numbers precludes the possibility of any one of 
these numbers appearing on other tickets drawn at the same time. 

If more than two variables are independent according to the above 
definition, it is clear that any two of them are independent. But the 
converse is not true: It is easy to imagine cases when any two of the 
variables are independent and yet they are not independent when taken 
in their totality. Therefore, when speaking of independence of variables, 
we must always specify whether they are independent in their totality 
or only in pairs. 

For two independent variables we have the following simple theorem: 

Theorem. The mathematical expectation of the product xy of two 
independent variables x and y is equal to the product of their expectations; 
orj in symbols 

E(xy) = E{x)E{y). 

Proof. Let Xi, X 2 , . . . x^ be the complete set of values for x, and 
Vh 2 / 2 , • yn the analogous set for y. Denoting the probability of 

X being equal to x,* by and similarly, the probability of y being equal 
to yi by Qjf the events 

X = x< and y = y, 

are independent by definition of independence—because the probability 
of X being equal to x< is not affected by the fact that y has assumed any 
one of its possible values, and it remains p*. 

By the theorem of compound probability the simultaneous occurrence 
of the events 


X = x< and y = y,- 

has the probability piQj, Again, by the extended definition of mathe¬ 
matical expectation 


m n 


E(xy) = 2) XpiqiXiVi 
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because the values of the product xy are determined by win exhaustive 
and mutually exclusive cases 

X Xi, y = y,- 

i = 1, 2, . . . m; i == 1, 2, . . . n. 

Now, performing the summation with respect to j first, while i remains 
constant, we have 

n n 

= pa^i'^qiVi = P^iEiv), 
i-l i-1 

and again 

m m 

= "^piXiEiy) = E{y)'^piXt, 

1-1 

or 

E{xy) = E{x)E{y). 

This theorem can be extended to the case of several factors inde¬ 
pendent in their totality. For instance, if x, y, z are independent, it is 
obvious that xy and z are also independent. Hence 

E{xyz) = E{xy)E{z), 

and again 

E{xyz) = E(x)E{y)E(z). 

In a similar way we can extend this theorem to any number of inde¬ 
pendent factors. 

As an important application, let us consider two independent variables 
X and y with the respective expectations a and h. The variables a? ~ a 
and y — 6 being independent also, we have 

E(x — a)(y - 5) = E(x — a)E(y — 6); 
but 

E(x — a) = E{x) — 0 = 0 — 0 * 0; 

therefore 

(5) - o)(y - 6) = 0. 

Dispersion and Standard Deviation 
6 . Let a; be a variable and a its mathematical expectation. The 
expectation of 

(ar - o)* 

is called ‘‘dispersion'' of the variable, and the square root of dispersion 
is usually called “standard deviation.'' As 

(a; — o)* = X® — 2ax -h o* 
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we can apply the theorem on the expectation of sums to the right-hand 
member of this identity and find 

E{x — a)* == E{x^) ~ 2aE{x) + a* = E(x*) — o* 

or, denoting by h the expectation of x*, 

(6) E(x - a)2 = 5 - a*. 

Thus, the computation of dispersion can be reduced to the computa¬ 
tion of the expectation of the variable itself and its square. Also, denot¬ 
ing by or the standard deviation of x, we have the formula 

cr* = 6 ~ a\ 

For instance, if the variable is the number of points on a die, we have 

7 , + 2* + • • • + 6* 91 

= 2^ ^-6-“6 

and 

^ = 2.917; IT = 1.708. 

Dispersion op Sums 

7. It is important to have a convenient formula to find the dispersion 
of a sum 

s = Xi + Xj + • • • + Xn 

of several stochastic variables. The expectation of 8 is given by 

E(«) = E{xi) + E{x2) + • • • + E{Xn) 
or 

E{s) = ai + + • • • -f- On, 

denoting by a, the expectation of x*. The deviation of s from its expecta¬ 
tion is, therefore, 

Xi + X2+ • • • + Xn - (Oi + a* + • * • + On), 

and we have to find the expectation of 

(Xi -f- Xj + • • • + Xn — Oi •— Oj — • • • — On)*. 

Now we have identically 

n 

(*1 + + • • ■ + Xn - ai - Ot- • • • - o,)* = - Oi)* + 

t-1 

+ - o.)(x; - a,), 
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the last sum being extended over all the different combinations of sub¬ 
scripts i and j for which i ^ j and consisting of n(n — l)/2 terms. 
The mathematical expectation of a sum being equal to the sum of the 
expectations of its terms, we must find the expectations of the terms 

(Xi — a<)^ and (Xi — a<)(a?y — Uy). 

The first is the dispersion of Xi and can be found from (6); namely, 

E(Xi — a<)* = — af = a} 

if hi is the expectation of xf. 

As to 

E{Xi - aiXxi - a,), 

instead of it we introduce the so-called “correlation coefficient” of Xi 
and Xi 

o _ E(xi - o<)(x,- - ai) 

tCi,i -- 

<r,try 

Denoting the required dispersion by Z), we obtain 

(7) D = aj + <r| + • • • + <ri + + 2Ri,tfri(Tz + • • * + 

-+• 2Rn^l,nlirn-l<rn 

so that the dispersion of a sum can be obtained as soon as we know the 
dispersion of its terms and their correlation coefl&cients. 

In an important case, expression (7) for dispersion can be greatly 
simplified. If the variables Xi, Xj, . , . Xn are independent in pairs, we 
see from (5) that all the correlation coefficients are = 0, so that in this 
case simply 

( 8 ) D = <rf + <ri + • • • + (T* = - of + 62 - ai + • • • -h 6n ~ a*. 

In other words, the dispersion of a sum of variables, any two of which 
are independent, is equal to the sum of dispersions of its terms. 

8. A few examples will serve to illustrate the use of these formulas. 
Problem 7. Find the dispersion of the number of successes in series 
of n independent trials with probabilities pi, P 2 , . . . Pn corresponding to 
first, second, . . . nth trial. 

Solution. As in Prob. 2 we associate with every trial a variable which 
assumes the value 1 or 0, according as the trial resulted in success or 
failure. These variables Xi, x*, . . . Xn are independent because the 
trials are supposed to be independent. The number of successes 

m * xi + X* -f • * • + x« 

is thus the sum of the independent variables. To find the dispersion of 
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any one of these variables we notice that 

E{x,) = 1 • p,. + 0 • = Pi 
E{xf) = 1 • Pi + 0 • = Pi; 

therefore the dispersion of Xi is 

<^i = Pi - Pf = PiQi 

and by ( 8 ) 

D = E{m ~ Pi - P 2 - • * * - PnY == Pl?l + P292 + • • • + Pn^n. 

In the Bernoullian case of independent trials with the same probability 
p, we have pi = p 2 = * * ‘ = Pn = p and 

E(m — npY = npg. 

This formula is equivalent to the relation 

n 

Tm{.m — npY = npq 

m—0 

established on page 116. 

Problem 8. In a lottery m tickets are drawn at a time out of n 
tickets numbered from 1 to n. Find the dispersion of the sum 8 of the 
numbers on the tickets drawn. 

Solution. Let Xi, Xa, . . . Xm be the variables representing the 
numbers on the first, second, . . . mth tickets. By Prob. 5 we know that 

E{xO = 


and in a similar way we find 


E{x!) = 


P + 2^ + — ■ + 
n 


(n + l)(2n + 1) 


whence the dispersion of x, is 





n* - 1 
12 


Since we deal in the present case with dependent variables, we must 
find the correlation coefficients, or, which is the same, 


4 - - H-‘) 

for every pair of subscripts i and j. The variable x* may have any of 
the values 1, 2, 3, . . . n, with the same probability 1/n; and x, may 
have any of the same values with the exception of that assumed by Xi 
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with the probability preceding expression consists of 

terms 

where Xj for given Xi = 1, 2, . . . n, ranges over all numbers 1, 2, 
3, . . . n with the exception of Xi, As 

it is obvious that 



n + 


-) - -(** - 


n + 1\ 
2 } 


and 


K" - - 


n 




1 

n(n — 1) 


2 (-- 

*<-l 


_ n -f 1 
12 


Everything now is ready for the application of (7). All simplifications 
performed, we get the following expression of the required dispersion 



If the variables were independent, the dispersion would be 

m{n^ — 1 ) 

12 


The dependence diminishes it, but the influence of dependence is not great 
if the ratio m/n is small. 


Problems for Solution 

1 . Find the mathematical expectation M of the absolute value of the discrepancy 
m — np in a series of n independent trials with constant probability p. Ans. By 
definition 

n 

M = Tmlrn - np | 

m —0 


where, as usual, 


n! 
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But since 


we have also 


Tmirn ~ np) = 0, 


m — O 


M = 2^ T^{m-np), 


m >np 

the sum being extended over all integers m which are >np. Denoting by F{z, y) the 
sum 


we have 


!/) = 5) 

m >np 

2 - «p)= 


m >np 


dp 


- npF{p, q). 


On the other hand, by Euler's theorem on homogeneous functions 


r, X dF dF 

nFip, g) = pj- + 9—' 
dp dq 


whence 


2 - "P) = “ WPflCJIlp'"'?""'*- 


m >np 

Here m represents an integer determined by 

p ^ np “1-1 < p + 1. 

The answer is therefore given by the simple formula 
M = 2npqC';;~_]p^-Y'^- 

2 . By applying Stirling's formula (Appendix 1, page 347) prove the following 
result: 


where 


Y_J_L_^ 

^ - l’- 1/ 


and n is so large as to make c ^ Ho- 
Hint: 


/ l2npq\ d d' 1 / 1 1 \ 

log ) < 2{np ^T) ~ S \np - d nq - »') 


i» 12(n? - O') 4{np - 0)' 4(ng - 0')* 
osijgi: »' + » = ! 
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S. What is the expectation of the number of failures preceding the first success in 
an indefinite series of independent trials with the probability p? 

iln». qp + 2g»p + Zq'p + • • • = ~ 

(1 — p 

4 . Balls are taken one by one out of an urn containing a white and h black balls 
until the first white ball is drawn. What is the expectation of the number of black 
balls preceding the first white ball? 

Ans. 1. By direct application of definition the following first expression for the 
required expectation M is obtained: 

j. ° r I. Mfc - 1 ) , 

a + b^a -f- 6 — 1 (a -f- b — l)(o “h ^ — 2) 

3 _ bib - l)(b - 2) _ 

(a + 6 - l)(o -f b - 2) (a 4- 6 - 3) 

Ans. 2. However, it is possible to find a simpler expression for M. Denote by zi the 
number of black balls preceding the first white ball, by X 2 the number of black balls 
between the first and second white ball, and so on; finally, by Xa+i the number of black 
balls following the last white ball. We have 

Xi 4" a;a + • • • 4* ®a+i = 6 

and 

Eixi) 4" Eixi) 4- • • • 4" ^?(x<,+i) = b. 

But as the probability of every sequence of balls (that is, of every system of numbers 
xi, X 2 ) . . . Xa+i) is the same, namely, 


it is easy to see that 

That is, 
or 


a\b\ 

(a 4 - b)\ 

Eixi) = EiXi) = • . . = Eixa^O = M. 
(a 4- 1)M - 6 
b 


M 


a 4- 1 


Equating this to the preceding expression for ilf, an interesting identity can be 
obtained, whose direct proof is left to the student. 

6 . In Prob. 6 , page 168, to determine the probability ^(n), we had an equation 

/ % . ”” 2) , , ^(0) 

ipin) +--- 4-- — -4- • • • 4- —r = 1; v’(O) = 1. 

1 ! 2 ! n! 

Find the general expression for y>(n) using the method of generating functions. Ans. 
Let 

Fix) = ^(0) 4- ^il)x 4- v»(2)x* + • • • 
be the generating function of ^(n). Multiplying this series by 


* 1! ‘*"2! '’’si ' 
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we find 


or 


whence 


e*Fix) « 1 + X + a:* + 


1 — X 


Fix) = 


1 — X 


^(n) = 1 - ri + ^ — + 4?" 


6. The total number of balls in an urn is known, but the number of white balls 
depends on chance and only its mathematical expectation is known. Find the prob¬ 
ability of drawing a white ball. Ana. Let N be the total number of balls and M the 
expectation of the number of white balls. The required probability is M/N. 

7. Two urns contain, respectively, a white and h black and a white and /3 black 
balls. A certain number c (naturally not exceeding o -f b) of balls is transferred 
from the first urn into the second. What is the probability of drawing a white ball 
from the second um after the transfers? Ana. The, required probability is 


a -f 


ca 

oTfi 


a + /3 4-c 


8 . An um contains a white and 6 black balls. After a ball is drawn, it is to bo 
returned to the urn if it is white; but if it is black, it is to be replaced by a white ball 
from another um. What is the probability of drawing a white ball after the foregoing 
operation has been repeated x times? Ana, Denote by M, the expectation cf the 
number of white balls after x operations. From the equation 

the following expression for Mx can be derived: 

It follows that the required probability is 

'-'-.-fiO-rqri)'- 

9. Urns 1 and 2 contain, respectively, o white and h black and c white and d black 
balls. One ball is taken from the first urn and transferred into the second, while 
simultaneously one ball taken from the second urn is transferred into the first. h:it 
is the probability of drawing a white ball from the first urn after such an exchange 
has been repeated x times? Ana. Let Mx and P* represent the mathematical expecta 
tions of the number of white balls in the first and second urn after x exchanges. Then 




Mx + 


Px 

c + d 


Mx 

o -b b' 


Mx 4- P* = a + c 
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whence 

M = (q + c)(o + h) ad - he / _1_L^V. 

a+6H-c+d a + 6-hcH-dy a+6 c + dj 

10. An urn contains pN white and qN black balla, the total number of balls being 
N. Balls are drawn one by one (without being returned to the urn) until a certain 
number n of balls is reached. What is the dispersion of the number m of white balls 
drawn? Ans. Let Xi = 1 if the ith ball drawn is white and Xi = 0 if it is black. 
We have 

E(xi) = p, E{m) = np, E{x]) = p 

and 

VQ 

E(.Xi - p)(xj - p) = E(x(Xi) - p* = _ - • 

The required dispersion is 


D == E(m — npY = npq 


N-n 
N -1 


11. In a lottery containing n numbers ( 1 , 2, 3, . . . n) m numbers are drawn at a 
time. Let Xi represent the frequency of a specified number i in iV drawings. Prove 
that 

EiXi) - Np, E{xi - Npy - Npq 
E{xi - Np){xi - Np) = Afp(p' - p); {i 9 ^ j) 

where 


p = 


m 

n 


q - p. 


p' = 


m - 1 


n - 1 ’ 


12. Let 


z» = (i» — NpY — Npq. 


Show that the dispersion of the sum 


is 


+ Z2 + • • • + 2f» 


D = 


2 N{N - 1) 
n — I 


{npq)\ 


Indication of the Proof. Let N variables { 2 , . . • {at be defined as follows: 

^ifc = —p if in the Aith drawing the number i fails to appear 
— q ii in the A;th drawing the number i appears. 

In a similar way, we can define N variables 171 , 172 , • vh associated with the 

number i 9 ^ i. Since 

Xi — Np = + • • • + 

Xj — Np = 171 + 17 * -f • • • 4* »7y 

we have 


The variables 
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being independent, we have 


But 

« E{e^h+*^t) = . . . = Eie^iif^^if) - 

*» pp'e«“+** + p(l - + p(l - p')e9'-^ + (g - p 4- ppOc-*^"*** « 

* F(u, »). 

Hence 

E{e^i*i-Np)+*ixrNp)) ^ F(u, v)^. 

It suffices to expand both members into power series in u and v and compare terms 
involving u*v* to find 


EiziZj); iy^j. 


The rest does not present serious difficulties except for somewhat complicated calcula¬ 
tions. 

13 . A box contains 2" tickets among which C* tickets bear the number i (t = 
0 , 1, 2, . . . n). A group of m tickets is drawn; denoting by s the sum of their 
numbers, it is required to find the expectation E and the dispersion D of s. 

. _ 1 ^1 m(m — l)n 

Ans. E — -mn \ U — -mn - 

2 ' 4 4(2’» - 1) 


14 . A box contains k varieties of objects, the number of objects of each variety 
being the same. These objects arc drawn one at a time and put back before the 
next drawing. Denoting by n the smallest number of drawings which produce 
objects of all varieties, find Bin) and E(n*). Ans. 


£W-*(i+5 + |+ ■ +!) 

«..) -E<n).-l'(l +i+ • . . +i) +1+ . . . +1). 

Use the result of Prob, 12, p. 41. 
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CHAPTER X 


THE LAW OF LARGE NUMBERS 


1. The developments of the preceding chapter, combined with a 
simple lemma due to Tshebysheff, lead in a natural and easy way to a 
far reaching generalization of Bernoulli’s theorem, known under the 
name of the “law of large numbers.” 

Tshebysheff’s Lemma. Let u be a variable which does not assume 
Tlegative values, and a its mathematical expectation. The probability of the 
inequality 


is always greater than 


u^at^ 



whatever t may be. 
Proof. Let 


• • • Un 


be all the possible values of the variable u and 


Pi, P 2 , . . . Pn 

their respective probabilities. By the definition of mathematical expec¬ 
tation, we have 

fl) PlUi + P2U2 + • • • + PnUn = a. 

We may suppose the notations so chosen that 


Ui, U2, , . , Ua 

are all the values of u which are the remaining values 


W04-2, • • • 


being >af*. If all the terms in (1) with subscripts 1, 2, ... a are 
dropped, the left-hand members can only be diminished, since these 
terms are positive or at least nonnegative by hypothesis. We have, 
therefore, 


Pa+ll^+l + • • • + PnWfi ^ a. 

But as 

Ui > at^ 

182 
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for i = a 4- 1| a + 2, . . . n a still stronger inequality, 


or 


a/*(p«+i + * * • + Pn) < a 

Pa+l + * • * + Pn < 


will hold. 

Here the left-hand member represents the probability Q of the 
inequality 

u > at^ 


because this inequality can materialize only in the following mutually 
exclusive forms: either u = u a+i, or u — Ua+ 2 , , . . or u = Un whose 
probabilities are, respectively, pa+i, p«+ 2 , • . . Pn> Thus 



But if P is the probability of the opposite event 


we must have 
whence 


u S Qi\ 

P + Q = 1, 


which proves the lemma. 

2. Let xi, X 2 i • • « Xn be a set of stochastic variables and ai, at, . 
their respective expectations. The dispersion of the sum 


Xi + X2 + • • • + Xn 


an 


which we shall denote by Bn is, by definition, the mathematical expecta¬ 
tion of the variable 


W ~ (Xl + X2 + • • • + Xn ~ fll - 02 - • • • - On)*. 


Tshebysheff^s lemma, applied to this variable u, shows that the proba¬ 
bility of the inequality 

(xi -|- X2 + • • • + X,* — Oi — 02 — • • * — o»)* ^ BnP 


1 


1 

P 


is greater than 
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But the preceding inequality is equivalent to two inequalities 

-ty/Wn ^ Xl + Xt+ • • • + *, - Ol — aj - • • • - On ^ ty/Wn 

or, dividing through by n, 


-Sf + - + 


+ Xf» Oi -f Ol + 




n n \ n* 

Hence, the probability of these inequalities for an arbitrary positive t 
is greater than 

Let € be an arbitrary positive number. Defining t by the equation 


whence 


A 






we arrive at the following conclusion: The probability P of the inequalities 


Xi + Xi + 


74 

equivalent to a single inequality 


+ Xn Ql + + 


+ 0>n 


Xi + Xt+ • • 

• + Xn Ol + 0» + • • 

* + On 

n 

n 



^ € 


is greater than 


1-^. 

nV 


Thus far nothing has been supposed about the behavior of Bn for 
indefinitely increasing n. We shall now suppose that the quotient 
Bn/n^ tends to 0 as n increases indefinitely. Then, having chosen two 
arbitrarily small positive numbers e and t;, a number no can be found so 
that the inequality 

Bn ^ 
n*€* 

will hold for n > no. Consequently, we shall have 

P > 1 - 1? 

for all n > no. This conclusion leads to the following important theorem 
due, in the main, to Tshebysheff: 
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\JBne Law of Large Numbers. With the probability approaching 1 or 
certainty as near as we please, we may expect ihqt the arithmetic mean of 
values actually assumed by n stochastic variables will differ from the arithmetic 
mean of their expectations by less than any given number, however small, 
provided the number of variables can be taken sufficiently large and provided 
the condition 

Bn ^ 

-r- ^ 0 as n —> « 
n® 

is fulfilled. 

If, instead of variables Xi, we consider new variables z, = Xi — Ui 
with their means = 0, the same theorem can be stated as follows: 

For a fixed € > 0, however small, the probability of the inequality 

gl + ^2 + * • * + gn 

tends to 1 as a limit when n increases indefinitely, provided 



This theorem is very general. It holds for independent or dependent 
variables indifferently if the sufficient condition for its validity, namely, 
that 

> 0 as n 00 

is fulfilled. 

3. This condition, which is recognized as sufficient, is at the same 
time necessary, if the variables Zi, Zi, . . . Zn are uniformly bounded; 
that is, if a constant number (one independent of n), C, can be found 
so that all particular values of Zi(i = 1, 2, ... n) are numerically less 
than C. Let P, as before, denote the probability of the inequality 

|gl + g2 + * • * + gn| ^ n€. 

Then the probability of the opposite inequality 

|gi + 22 + * * • + g»l > ne 

will be 1 — P. 

Now, by definition, 

Bn = E{Zi -j- 22 + ' * * + gji)* 

whence one can easily derive the inequality 



Bn < n*C*(l -P) + nh^P 



186 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. X 


from which it follows that 

^ < CKl - ^’) + < ** + C\\ - P). 

If the law of large numbers holds, 1 — P converges to 0 when n 
increases indefinitely, so that the right-hand member for sufficiently 
large n becomes less than any given number, and that implies 


which proves the statement. 

4. There is an important case in which the law of large numbers 
certainly holds; namely, when variables Xi, X 2 , . . . Xn are independent 
and the expectations of their squares are bounded. Then a constant 
number C exists such that 

hi = P(xJ) <C for i = 1, 2, 3, . . . . 

On the other hand, for independent variables 

n n 

Bn = - “*•) ^ 

and 

—>0 as n—> 00 . 

n* n 

The expectations of squares are bounded, for instance, when all the 
variables are uniformly bounded, which is true, for instance, for *‘iden- 
ticar' or ^'equar' variables. Variables are said to be identical if they 
possess the same set of values with the same corresponding probabilities. 

6. E. Czuber made a complete investigation of the results of 2,854 
drawings in a lottery operated in Prague between 1754 and 1886. It 
consisted of 90 numbers, of which 5 were taken in each drawing. From 
Czuber^s book Wahrscheinlichkeitsrechnung,” vol. 1, p. 141 (2d ed., 
1908), we reprint the table shown on page 187. 

With the 2,854 drawings, we associate 2,854 variables, Xi, Xa, . . . Xassi 
representing the sum of five numbers appearing in each of the 2,854 
drawings. These variables are identical and independent with the 
common mathematical expectation 227.5. Hence, by the law of large 
numbers, we can expect that the arithmetic mean of actually observed 
values of these variables will not notably differ from 227.5. To form 
the sum 

2854 

S = 
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Numbers 

Their frequency 
m 

Difference 
m - 158 

6 

138 

~20 

39, 65 

139 

-19 

16, 41, 76, 87 

142 

-16 

2, 14, 56, 79, 86 

143 

-15 

18, 44, 47 

144 

-14 

72, 80 

145 

-13 

12 

146 

-12 

21, 53 

147 

-11 

70 

149 

- 9 

24, 32, 56, 69 

150 

- 8 

27, 64, 75 

151 

- 7 

81 

152 

- 6 

23, 29, 85 

153 

- 5 

19, 35, 42, 74 

154 

- 4 

7, 20, 59 

155 

- 3 

13, 34, 40, 67, 88 

156 

- 2 

11, 52, 68 

157 

- 1 

17, 82 

158 

0 

15, 90 

159 

1 

58 

160 

2 

8 , 25, 36 

161 

3 

22 

162 

4 

33, 57 

163 

5 

51 

164 

6 

3, 43, 45, 48 

165 

7 

10 , 26, 66 

166 

8 

1, 5, 60, 84 

167 

9 

50, 62 

168 

10 

9, 61, 63 

170 

12 

54, 73 

171 

13 

49, 71, 78 

172 

14 

28 

173 

15 

37 

176 

18 

30, 46 

177 

19 

89 

178 

20 

31 

179 

21 

38 

184 

26 

4 

185 

27 

77 

186 

28 

83 

189 

31 


we must multiply the frequencies given in the preceding table by the 
sum of corresponding numbers. To simplify the task we notice that all 
numbers from 1 to 90, actually appeared. Hence, we multiply the 
sum of these numbers, 4,095, by 158, which gives: 

4095 • 158 = 647,010, 
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and then add to this number the sum of the differences m — 158 multi¬ 
plied by the sum of the numbers in the same line. The results are: 


Hence 


Sum of positive products 
22,336 


Sum of negative products 
-19,687. 


and 


S = 647,010 + 22,336 - 19,587 = 649,759 


S 

2854 


227.67, 


which differs very little from the expected value 227.5. An even larger 
difference would be in perfect agreement with the law of large numbers 
since 2,854, the number of variables, is not very great. 

6. The two experiments reported in this section were made by the 
author in spare moments. In the first experiment 64 tickets bearing 
numbers 0, 1, 2, 3, 4, 5, 6 and occurring in the following proportions: 


Number. 

0 

1 

2 


4 

5 

6 


Frequency. 

1 

6 

15 

20 

15 

6 

1 



were vigorously agitated in a tin can and then 10 tickets were drawn at a 
time and their numbers added. Altogether 2,500 such drawings were 
made and their results carefully recorded. From these records we 
derive Tables I and II. 


Table 1 


Number 

Frequency observed 

Expected frequency 

Discrepancy 

0 

404 

390.625 

+ 13.375 

1 

2,321 

2,343.76 

-22.75 

2 

5,850 

5,859.375 

- 9.375 

3 

7,863 

7,812.5 

+50.5 

4 

5,821 

5,859.375 

-38.375 

5 

2,344 

2,343.76 

+ 0.25 

6 

397 

390.625 

+ 6.375 


The next table gives the absolute values of differences s — 30 where s 
is the sum of the numbers on 10 tickets drawn at one time, and their 
respective frequencies. 

From Table I it is easy to find that the arithmetic mean of all 2,500 
sums observed is; 


74996 

2500 


29.9984 
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Table II 


|. - 30l 

Frequency observed 

j« - 30| 

Frequency observed 


246 

7 

71 

1 

549 

8 

44 

2 

470 

9 

25 

3 

379 

10 

8 

4 

324 

11 

4 

5 

241 

12 

1 

6 

129 




whereas the expectation of each of the 2,500 identical variables under 
consideration by Prob. 13, page 181, is 30. By the same problem the 
^dispersion of s, that is, E(s — 30)* is 12.857. On the other hand, from 
Table II we find that 


S(« - 30)* = 31477 

and 


2:(s - 30)* 
2500 


= 12.5908 


fairly close to 12.857. 

In the second experiment we tried to produce cards of every suit in n 
drawings (n being the smallest number required) of one card at a time, 
each card taken being returned before the next drawing. By Prob. 14, 
page 181, we find that the expectation and the dispersion of this number 
n are, respectively, and 14.44. Altogether 3,000 values of n were 
recorded, of which 33 was the largest. Values of the difference n — 8 are 
given in Table III. 


Table III 


n - 8 

Frequency 

n - 8 

Frequency 

n — 8 

Frequency 

-4 

282 

6 

77 

16 

3 

-3 

420 

7 

50 

17 

5 

-2 

426 

8 

40 

18 

2 

-1 

407 

9 

31 

19 

1 

0 

348 

10 

17 

20 

3 

1 

247 

11 

15 

21 

1 

2 

228 

12 

13 

22 

1 

3 

156 

13 

6 

23 

1 

4 

116 

14 

9 

24 

0 

5 

88 

15 

6 

25 

1 


From this table we find 

^(n - 8) = 965, Zin ~ 8)* = 43,395, 
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whence 

S(n - 8J)* = 2:(n - 8)* - i2:(n - 8) + ^ = 43,085 
Sn = 24,965. 

By the law of large numbers we may expect that the quotients 


'Ln 

3000 


and 


2(n - Sir 
3000 


will not considerably differ from S}4 and 14.44, respectively. As a 
matter of fact. 


= 8.322, 


2(n - 8J)*^ 


= 14.362. 


There is a very satisfactory agreement between the theory and this 
experiment in another respect. Of 24,965 cards drawn there were 


6,304 hearts 
6,236 diamonds 
6,131 clubs 
6,294 spades 

whereas the expected number for each suit is 6241.25. 

7. So far, we have dealt with stochastic variables having only a finite 
number of values. However, the notion of mathematical expectation, 
and the propositions essentially based on this notion, can be extended to 
variables with infinitely many values. Here we shall consider the 
simplest case of variables with a countable set of values, that can be 
arranged in a sequence 

• * * <! 05—2 Ofo OCl <1 Ot2 • • • 

in the order of their magnitude. 

With this sequence is associated the sequence of probabilities 


. . . , p_ 2 , p_i, po, Pit P 2 , . . . 

so that in general p* is the probability for x to assume the value a,*. 
These probabilities are subject to the condition that the series 

2p< =•••-!- + p_i + Po + Pi + P 2 + • * • 

must be convergent with the sum 1. 

The definition of mathematical expectation is essentially the same 
as that for variables with a finite number of values, but instead of a 
finite sum, we have an infinite series 

E{x) = 2p<a< 

provided this series is convergent (it is absolutely convergent, if con¬ 
vergent at all). If this series is divergent, it is meaningless to speak of 
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the mathematical expectation of z. Likewise, the mathematical expec¬ 
tation of any function (p{z) is defined as being the sum of the series 

E{ip(x)\ = 2p<v>(a<), 

provided the latter is convergent. 

It can easily be seen that various theorems established in Chap. IX, 
as well as Tshebysheff’s lemma, continue to hold when the various mathe¬ 
matical expectations involved exist. 

The law of large numbers follows, as a simple corollary, from Tsheby- 
sheff’s lemma if the following requirements are fulfilled: 

а. Mathematical expectations of all variables Xi, Xs, xt, . . . exist. 

б. The dispersion Bn of the sum Xi + x* + • • • + Xn exists, 
c. The quotient Bn/n^ tends to 0 as n tends to infinity. 

The first requirement is absolutely indispensable. Without it the 
theorem itself cannot be stated. The second requirement (not to speak 
of the third) need not be fulfilled; and still the law of large numbers may 
hold, as Markoff pointed out. 

8. Let Xi, z%, Xj, . . . be independent variables. If for every i 
the mathematical expectation 

E(A) 

exists, the quantity Bn exists also. But if at least one of these expecta¬ 
tions does not exist, the quantity Bn has no meaning. However, the 
following theorem, due to Markoff, holds: 

Theorem. The law of large numbers holds, provided that for some 
B > 0 all the malhematical expectations 

i = 1,2,3, .. . 

exist and are bounded. 

Proof. For the sake of simplicity we may assume that 
E{xi) =0; i = 1, 2, 3, . . . . 

For, supposing 

E{xi) = a,-; f *= 1, 2, 3, . . . 
instead of x<, we may consider new variables 


Then 


Zi = Xi — Qi, 


E(zi) - 0 


and it remains to prove the existence and boundedness of 


£(|z,|*«); t = 1, 2, 3,- 
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The proof follows immediately from the inequalities 


lx, ~ ^ + Ia4»+M 

the first of which is well known; the second is a particular case of Lia- 
pounoff’s inequality, established in Chap. XIII, page 265. 

Thus, from the outset we are entitled to assume that 

E{x,) = 0. 

The proof of the theorem is based on a very ingenious and useful 
device due to Markoff. Let J\r be a positive number which later we shall 
increase indefinitely. Together with Xi we shall consider two new varia¬ 
bles, Ui and t;«, defined as follows: a being a particular value of the 
corresponding values of Ui and t;, are 


Ui — ay r, = 0 

if \a\ ^ N and 

Ui = 0, Vi a 

if |a| > N. Thus, stochastic variables Ui and v, are completely defined. 
Evidently 


whence 

and 

Now 


Xi = + Vi 

0 ~ E{u^ + E{vi) 

Pi = E{ui) = -E(vi). 
E{\vi\^+^) g E(|a:,li+») < c 


by hypothesis. Since v* is either 0 or its absolute value is >N, we have 
N*E{\Vi\) g < c, 

whence 


( 2 ) 


1^,1 = |£W| < jf,- 


Likewise, the probability Qi for ^ 0 satisfies the inequality 
m+iqi ^ < c, 

whence 

(3) < j^i+y 
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Now, let us consider two inequalities 

(4) 

(5) 

where <r is an arbitrary positive number and let Po and P be their respec¬ 
tive probabilities. The inequalities (4) and (5) coincide when 

VI = V2 = * • * = l>n = 0. 

With this supplementary condition they have the same probability Q. 
But they can hold also when at least one of the numbers 


Ml + M2 -h * * 

* + Mn 

n 


\xi X 2 * * 

• + Xp 

« 


Vl, V2y . . . Vn 

is different from 0. Let the probabilities of (4) and (5) under such 
circumstances be jRo and R. Then 

Po =^Q + Roy P ^Q + R. 

But evidently neither Ro nor R can exceed the .probability that in the 
series 

Vly t;2, . . . Vn 

at least one number is different from 0; this probability in turn does not 
exceed (see Chap. II, page 30) 


gi + ^2 + ’ * • -f ?» < 


Hence 


R < 

and 



(6) 

\p - Pol 

nc 

< 


On the other hand, since none of the values of w,(i =* 1, 2, . . . n) 
exceeds N, we have 

^ ^ < cN^-\ 

Accordingly, the dispersion of the sum Ui + M 2 + * * * + Un will be 
less than 


Hence, by what has been proved in Sec. 2, the probability of the ine¬ 
quality 


( 7 ) 


Ml 4" M 2 -{- • 

• + Mn + /32 + * * 

* + 

n 

n 
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is greater than 




But whenever (7) is satisfied, the inequality 
( 8 ) 


«1 + Wj + • • 

* + Un 

n 



^ € , |/3l -f jSj + • • • + /3n| 

~ 2 n 

is also satisfied. Hence, the probability of this inequality is a fortiori 
greater than 

_ 4ciSri-* 

Owing to inequalities (2), the following inequality follows from (8): 


ni + M2 + • • 

* + Un 

n 



Hence 


and on account of (6) 


Po> 1 - 


< 2 AT* 


4ciV‘-* 


c*n 


P > 1 


AcN^~^ nc 


Now we can dispose of the arbitrary number N by taking 


^ 2 


Then 


P > 1 - 2. 




Now N tends to infinity with n and as soon as n surpasses a certain 
limit no, the fraction 

c 

Ni 

will become and remain less than c/2. The probability of the inequality 

< € 


Xi + + • • 

“f" 

n 



for n > no will be greater than P and consequently greater than 
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It tends, therefore, to 1 as n tends to infinity, and that proves Markoff’s 
theorem. 

Example. Let the possible values of the variable x,{p = 1, 2, 3, . . . ) be 
+ 1)1, p-Kp + 1)«. p-Hp + 1)1, . . . 
with the corresponding probabilities 


Since the series 


P P _P_ 

p + l’ (P + 1)*’ (P -H 1)** ’ * ’ ' 



1 

“I— 
P 


+ . . . 


is divergent, the mathematical expectation 


E(xi) 


does not exist. Yet the law of large numbers holds. For 




p-*2' 


.i(p+1) 




is a convergent series for any 0 < 3 < 1. Moreover, 


r| 


n~l (p + 1)^ 


1- g 
2 2" 


and consequently the conditions of Markoff’s theorem are satisfied for any 0 < 5 < 1. 
Hence, the law of large numbers holds in this example. 

9. If variables Xi, X 2 , xj, . . . are identical, the law of large numbers 
holds without any other restrictions, except that for the.se variables mathe¬ 
matical expectations exist. In fact, Khintchine proved the following 
theorem : 

Theorem. If, as we may naturally suppose, E{xi) — 0, the probability 
of the inequality 

Xi + Xa + • • • + x„ 

-- ^ f 

n 


tends to 1 as n increases indefinitely. 

Proof. The proof is quite similar to that of Markoff’s theorem and 
is based on the same ingenious artifice. Let 


• • • < a_2 < a_i < ao < «! < Ota < * ' ‘ 

be different values of any one of the identical variables Xi, Xa, j z, . . . and 
, . . . , p_2, P-h Po, Ph P2, . . . 
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their probabilities. By hypothesis 

is a convergent series with the sum 0. The series 

is also convergent; let c > 0 be its sum. 

Keeping the same notations as before, we have 

m ^ E{\vi) = X 

\ai\ >N 

where ^(iV) is a decreasing function tending to 0 as iV <». Also 
E{u\) ^ = cN 

so that the (Mspersion of the sum 

Ui U2 • * * + Wn 

is less than 

cNn, 

Consequently the probability of the inequality 


(9) 


+ ^2 + ' * 

* + Wn /3l + ^2 + ' * 

’ + 

n 

n 





is greater than 


1 - 


AcN 


On the other hand, the probability q, of the inequality v. ^ 0 is less 


than 

HN) 


N 

because 

N X Pi < 


|a<l >N 

and 

II 

M 


\ai\ >N 


Hence, the difference between the probability of the inequality 

+ W2 + * • • + Wnl ^ 

- < <r 

n 
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and that of the inequality 


Xi + Xt + ‘ • 

• +Xn 

n 



is numerically less than 


n^N) 

N 


As in the preceding section we conclude that the probability of the 
inequality 


is greater than 


+ * * 

• + Un 

n 



5 + m 


icN 

€*n 


Finally, the probability of the inequality 


( 10 ) 


Xi ~\r X2 * * 

• 4" Xn 

n 



^ I + ^(i^0 


is greater than 


icN _ n^l/jN) 
c*n N 


To dispose of N we observe that the ratio 

VW) 

N 

is a decreasing function of N and tends to 0 as iV —> «. Hence, at least 
for large n, there exists an integer N such that 

VW) ^ VHN - 1 ) 

N ^ tn ^ N -1 ' 

Then 


n^{N) 

N 




^cN 

€*n 


\/4c 




whence it follows that the probability of inequality (10) is greater than 

1 - ^[VW) + - !)]• 

Now N increases indefinitely together with n; therefore, for all n 
above a certain limit n©, 

HN) < I 
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90 that for n > no the probability of the inequality 


*1 + Xi + • • 

• +*. 

n 



will be greater than 

1 _ - 1)] 

and with indefinitely increasing n will approach the limit 1. Thus 
Khintchine’s theorem is completely proved. 

Example. Let 

2*“*<>**, 2*“*o«*, . . . . . . 

be all possible values of identical variables xu Xt, Xa, . . . and 

1 Jl 1_ 2. 

2’ 2»’ 2>’ ■ ■ ‘ 2-’ ■ ‘ ■ 


their corresponding probabilities. Since the series 


2«o«i 


^ ^ a*®** ^ 


is convergent, mathematical expectations of the variables Xu Xj, Xa, . . . exist. 
Hence, the law of large numbers holds in this case. 

Markoff’s theorem cannot be applied here, because for any positve 5 the series 


is divergent. 


2 


2»< 

^(l+*)lo«4 


1 


Problems for Solution 

1. Let a; be a stochastic variable with the mean » 0 and the standard deviation <r. 
Denoting by P(i) the probability of the inequality 

X ^ t 

show that 

Pit) ^ -r^Ti ^ 0 

1 - PH) ^ for t > 0. 

<r* + P 

Show also that the right-hand members cannot be replaced by smaller numbers. 
Indication of the Proof. Since 

ZpiXi = 0, ZpizJ « V*, 

we have also 

Xpiixi - 0 « -f, 2pi(x< - 0* = 4- <» 
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whence, supposing that > < for t *= 1, 2, . . . « and first taking t negative, 

S I - 01 S ^Pi(xi - <)« S (1 - P(0)(»* + «•) 

(i-1 ) »-l »-l 

For positive t the proof is quite similar. Considering a stochastic variable with 
two values; 


Xi = t. Pi 


cr» +<» 

<r* t* 

one can esksily prove the last part of our statement. 

2 . TshebyshejBTs Problem.^ If a; is a positive stochastic; variable with given 

E{x) = (T*, E(x^) = 

then the probability P of the inequality 

X ^ V 

has the following precise upper bounds: 

P ^ 1 for w < <r* 

O’* T* 

P ^ — for cr* ^ V < — 

V V* 


p ^ 


+ 0* - 2<rhf 


for V ^ 


Indication of the Proof. Let 


Then ^ » if i; ^ T*/<r* and 


€ = 


<r*w -T< 


for X V. On the other hand, 


whence 


/x - {V 

\v-^) (v-O* r*-fi;*~5 


P ^ 


T* +V* - 2ahf 

* Sur les valeurs limites des inteavaios. Jovr. Liorwilley Ser. 2, T. XIX, 1874. 
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The equality sign is reached for the stochastic variable with two values: 


xi * L 


Xt = V, 


Pi 

Pt 


(V - 

t4 4 . p* - 2ah} 
T* - ir* 

t 4 4. pi - 2«r*»’ 


If a* ^ r < r*/<r* we have an obvious inequality 


P ^E 




p 


To show that the right-hand member cannot be replaced by a smaller number, con¬ 
sider the following stochastic variable with three values: 


Xi 

— n 

Pi 

= IL 

— a*)v — fer* 4" T* 

— u. 


Iv 




Ur^ 

-T* 

Xi 

* 

Pi 

~ vil 

-I') 


= 1, 



- 

Xi 

P» 




where 1 > v is an arbitrary number. For this variable 


a* r* — cr*V 

” = Pi + Pj *- : - 

V b> 


is arbitrarily near to for sufficiently large 1. 

S. If X is an arbitrary stochastic variable with given 

F(x>) = <r*, E{x^) = 

and P denotes the probability of the inequality 

|x| ^ hr, 

then 



These inequalities cannot be improved. 

Hint: Follows from Tshebysheff’s problem. 

4 . Let Xi assume two values, i and —i with equal probabilities. Show that the 
law of large numbers cannot be applied to variables X\, Xs, Xj, . . . . 

6. Variables xj, X), X|, . . . each assume two values: 
logo or —logo; log (o + 1) or —log (o 4-1); log(a+2) or —log ( 0 + 2 ); • • • 
with equal probabilities. Show that the law of large numbers holds for these vari¬ 
ables. 
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Hint: E{xi) * 0; i = 1, 2, 3, . . . 

Bn = E{xi -♦"**+••• + 3 : 0 * = 

n — l 

= llog (a 4- »))* ^ (a -f n — l){log (a + n - 1)}* 

t-0 


as can easily be established by using Euler's summation formula (Appendix 1, page 
347). Hence 


n* 


— 0 


as 


n -♦ 00 . 


6. If Xi can have only two values with equal probabilities, and — i“, show that the 
law of large numbers can be applied to xi, Xj, Xa, . . . if a < 4a- 

Hint: 

Bn 1 

Bn = + • • • + - ;r— 0 if a < -- 

2a 1 n* 2 

It can be shown that the law of large numbers dees not hold if a ^ 4^. 

7. In an indefinite Bernoullian series of trials with the constant probability p, 
let mi denote the number of successes in the first i trials. Show that the law of large 
numbers holds for variables 


m, - ip 
{ipq)o ' 


t * 1, 2, 3, . . . 


if « > M- 

Hint: Evidently E(x.) = 0, B(x5) = {ipqy~*** and 


Now 


B. = + 2'^E{XiXi). 

»"1 }>i 


B(x.x,) = iij )'**{ pq )-^ E{mi - t»* + (i/)-^(p^)“*“Bl (r«i - tp)(m,- m, - (j-i)p)| = 
since m< — ip and — mi — {j — i)p are independent variables. Thus 


B. = +22^.‘-V-] 

f-1 j>% 


and it is easy to show that 


n* 


0 


as 


n —> «o 


provided o > H. But the law of large numbers no longer holds if a ^ 4^. The 
proof of this is more difficult. 

8 . The following extension of Tshebysheff’s lemma was indicated by Kolmogoroff, 
Let Xi, X* ... Xn be independent variables; £'(x.) = 0, E{x\) ~ 


Bn — b\ ht -{- • • • 4-61 
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and 

«* * xi + aca + • • • 4* X*; & == 1 , 2 , . . . n. 

Denoting by P the probability of the inequality 

(il) max. (sj, si) > B4'f 

we shall have P < l/P. 

IndiccUion of the Proof. The inequality (A) can materialize if and only if one of 
the following mutually exclusive events occurs: 

event ci: sj > BnP; 

event 62 : s\ ^ BnP] sj > 

event €$: s\ ^ BnP; sj ^ BnP; s\ > Bj}\ 

event e«: sj ^ sj ^ • • • si_i ^ sj > Bj}. 

If (si) represents the probability of «»(» = 1 , 2 , . . . n) then 
P = (ei) + ie,) + • • • + («n). 

Now consider the conditional mathematical expectation B(si|ejt) of sj given that 
eft has occurred. Since the indication of eft does not affect variables Xft+i, Xft+a, . . . x«, 
these variables and s* are independent. Hence 

“ N{8l\ek) ■+■ hk+i -f- • • • -{• bn > 

On the other hand 


n 

B. - £(4) = X > Bj*|(ei) + (e.) +•■•+(«.)) 

ifc=l 


whence P < 1/P. 

9 . The Strong Law of Large Numbers {Kolmogoroff). Using the same notations 
as in the preceding problem, show that the probability of the simultaneous inequalities 


Sn 

n 




Sf»+i 

n + 1 




Sn-hg 

jn +2 




will be greater than 1 — tj, provided n exceeds a certain limit depending on the choice 
of e and T}, and granted the convergence of the series 



Indication of the Proof. Consider variables 


T* 


max. 



for Vi 


2<'">n ^ m < 2^; i 


1, 2, 3, 


and denote by the probability of the inequality t< > 
lemma 




9i < 


i X ^ 


By Kolmogoroff's 


2i<-*n*«* 
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and 

« Z-2‘n-l - Z-2<n-l 

?.+«•+«•+•■• < X *" < X I 

<-l Z-2'-»n »-l Z-2*-»n 

or 

ao 

9i +9. +9i + • • • < 

ifc — n 

Hence, the probability of fulfillment of all the inequalities r< ^ t = 1, 2, 3, . . . 
is greater than 


ib»n 


The inequalities |«*/A;| ^ fc = n, n -|- 1, n -h 2, 
taneously 


and 


r» ^ t = 1, 2, 3, 


Sn-l 

n 


1 

- 2"‘ 


. . are satisfied when simul- 


The probability of the last inequality being greater than 1 — the probability 

n*«* 

of simultaneous inequalities 

^ ^ = n, n + 1, n + 2, . . . 

k 

a fortiori will be greater than 



k = n 


This inequality suffices to complete the proof if we notice that B»/n* tends to 0 when 
the series 



is convergent. 

10. Let xi, Xj, . . . Xn be identical stochastic variables and E{xi) 
by j^n(«) and PnCe), respectively, the probabilities of the inequalities 


0. Denoting 


Xi Xi + • 

n 


show that 


Xn j Xi Xi Xn ^ 

- > € and - < 

n 


hm =-= 0 or -f «> 

n - «p„(€) 


« 


according as E{x\) > or <0, 
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For the proof see Khintchine's paper in Mathemaiische Annalen (vol. 101, pp. 381- 
385). 

11. The Law of the Repeated Logarithm (Khintehiney Kolmogoroff), Let Xi, Xty 
... Xn he bounded independent variables, Eixi) = 0, t 1, 2, . . . n and Bn —► « 
as n —» 00 . For an arbitrarily small 5 > 0 and c > 0 and for an arbitrarily large N 
one can choose no > N ao that: 

а. The probability of the fulfillment of the inequality 

|Sn| > (1 + a)\/2BriogTog^ 

for at least one n ^ no is less than c. 

б. The probability of the fulfillment of the inequality 

|«n| > (1 - h)y/2Bn log log Bn 

for at least one n ^ no is greater than 1 — 

For the proof see Kolmogoroff's paper in Maihemaiiache Annalen (vol. 101, pp. 126- 
135). 

If Xiy Xi, . . . Xn are variables independent in pairs and Bn the dispersion of their 
sum « = Xi 4- 4- * • * Xny then the probability P that 

1*1 S ty/lB. 

satisfies the inequality 

P > \ (Tshebysheff's inequality) 

provided E{x\) = 0, i — 1,2, . . . n, which can be assumed without loss of generality. 
In case variables are totally independent and are subject to certain limitations of com¬ 
paratively mild character, S. Bernstein has shown that Tshebysheff's inequality can be 
considerably improved. 

12. Let Xi, X*, . . . Xn be totally independent variables. We suppose B(x<) = 0, 
B(xJ) = hi and 

£(|i<|*) S 

for < = 1, 2, . . . n and h > 2y c being a certain constant. Show that 
A = B{ 

where a is an arbitrary positive number < 1 and e is a positive number so small that 
eC a. 

Indication of the Proof, We have 

n-2 

h 

«(*"<) s 1 + j.* 2) (“)" < 

n-0 


whence 
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IS. If Q denotes the probability of the inequality 

Xl + *2 + ' 


. . 

'+■*»*> ;r;:-:: H— 


show that Q < e“**. 


2(1 - <r) 

Indication of the Proof. If Q is the probability of the inequality 

. . . +,,) > 

then, by Tshebysheff’s lemma, "Q < €“** and Q < 5 by Prob. 12. 

14. S. Bernstein’s Inequality. Denoting by P the probability of the inequality 

\Xi Xi • • • + iCnl ^ a>, 

a> being a given positive number, show that 


P > 1 - 2« 2B»+2c«. 


Bni P <\/2(l — a) 

Indication of the Proof. To make - + - = P minimum take t ■ 


2(1 -a) e 


Vb. ' 


then 


/ 2Bn 

F = j-and t is determined by equating P to w. The resulting value of e, 


-a - 


Coi cu 

is admissible only if €C ^ <r or — (1 — or) ^ v. The best choice for o- is <r «= —- 

Bn Bn ■{- CO) 


and correspondingly ( 


‘\/^Bn “h 2cu) 
Xi 4- Xa + 


By Prob. 13 the probability of the inequality 

• • • + Xn > W 


is less than e 2fin+2c« same is true of the probability of the inequality 

Xi 4- Xa + • • -f- Xn < —w or —Xi — Xa — • • • —x» > w. 

16. If variables xi, xa, . . . Xn are uniformly bounded and M is an upper bound 
of their numerical values, then we may take c = M/3. 

Indication of the Proof. Note that 


P(lxil*) ^ ^ 


Hfi"- 


16. Consider a Poisson’s series of trials with probabilities pi, pa, . . . p, for an 

Pi 4- P2 4- • • • 4~ p» 
n 

1 


event E to occur. Let m be the frequency of E in n trials, p = 


X « -(pigi 4 - PiQi 4 - 
n 


4* Pi»9n). Show that the probability P of the inequality 


-p ^ < has the following lower limit: 

n 




P >l - 2€ 
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In the Bernoullian case pi = p* =* • • * M consequently 

n«» 

P > 1 - 2e 2W+1.. 


17. An indefinite series of totally independent variables Xi, Xt^ Xi, , . . has the 
property that the mathematical expectations of any odd power of these variables is 
rigorously = 0 while 


E(x?) ^ 


\2/ fcl ’ 


bi = Je(x5) 


for t = 1, 2, 3, . . . . Prove that the probability of either one of the inequalities 
a;i + xj -f • • • 4- a!n > t\/2Bn or xi -h + • • • + x* < — iy/^n 
where Bn = + 6* + * • • + bn is less than 6“** (S. Bernstein). Prove first that 

B(e«*0 ^ c ^ . 

18. Positive and negative proper decimal fractions limited to, say, five decimals, 
arc obtained in the following manner: From an urn containing tickets with numbers 
0, 1, 2, ... 9 in equal proportion, five tickets are drawn in succession (the ticket 
drawn in a previous trial being returned before the next) and their respective numbers 
are written in succession as five decimals of a proper fraction. This fraction, if not 
equal to 0, is preceded by the sign + or —, according as a coin tossed at the same time 
shows heads or tails. Thus, repeating this process several times, we may obtain as 
many positive or negative proper fractions with five decimals as we desire. What 
can be said about the probability that the sum of n such fractions will be contained 
between prescribed limits — « and w? Ans. These n fractions may be considered as 
so many identical stochastic variables for each of which 


Besides, 


(1 - 10 -*) (2 - 10 -*) 1 

* 0, ^ = B(x*) == ^ 


10»-1 




E[x») = !rl.. <: ^ 


2k + 1 


since in general 


1« + 2** + • • • +(«-!)**< 


2k + 1 


Again, the inequality 


E{x^) ^ 


/gV (2fc)l 

\v 


can easily be verified and we can apply the result of Prob. 17. For the required 
probability P the following lower limit can be obtained: 
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or, if ca « n« 

P > 1 -2« 

For example, if c » Ho ftnd n ^ 814, 

P > 0.99999, 

that is, almost certainly the sum of 814 fractions formed in the above described man> 
ner will be contained between —82 and 82. 
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CHAPTER XI 


APPLICATIONS OF THE LAW OF LARGE NUMBERS 

1. A theorem of such wide generality as the law of large numbers is a 
source of a great many important particular theorems. We shall begin 
with a generalization of Bernoulli’s theorem due to Poisson. 

Let us consider a series of independent trials with the respective 
probabilities pi, pj, ps, . . . , varying from one trial to another. Con¬ 
sidering n trials, we shall denote by m the number of successes. The 
arithmetic mean of probabilities in n trials 

^ Pi + + • • • + Pn 

^ n 

will be called the *^mean probability in n trials.” With such conditions 
and notations adopted, we can state Poisson’s theorem as follows: 

Poisson’s Theorem. The probability of the inequality 


for fixed € > 0, no matter how small, can he made as near to 1 (certairUy) as 
we please, provided the number of trials n is sufficiently large. 

Proof. To show that this theorem is but a particular case of the law 
of large numbers, we use an artifice often applied in similar circum¬ 
stances, namely, we associate with trials 1, 2, 3, ... n variables Xi, 
a? 2 , xj, . . . Xn defined as follows: 

Xi = 1 in case of success in the iih trial, 

Xi = 0 in case of failure in the iih. trial. 

Since the trials are independent, these variables are also independent. 
Moreover 

E{xi) = Eix^) = Pi 

and the dispersion of Xi is 

Vi-Pl ^ p<g<- 
The dispersion Bn of the sum 

ici + a?2 + • • • + 

is the sum of the dispersions of its terms, that is, 

7t 

Bn = Pl^l + P2q2 + • • • + PnQn g 

At the same time, the former sum represents the number of successes m, 

« 208 
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Now, appl 3 dng the results established in Chap. X, Sec. 2, we arrive 
at this conclusion: Denoting by P the probability of the inequality 


we shall have 


m 

n 


P\ 




P > 1 - 


Bn 


^ 1 - 


1 

4n€* 


It now suffices to take 


to have 



P>l-rj 


where rj is an arbitrary positive number no matter how small. That 
completes the proof of Poisson’s theorem. 

Evidently Bernoulli’s theorem is contained in Poisson’s theorem as a 
particular case when 


Pi s= = • • • == p« == p. 

Poisson himself attached great importance to his theorem and adopted 
for it the name of the ‘‘law of large numbers,” which is still used by many 
authors. However, it appears more proper to reserve this name to the 
theorem established in Chap. X, Sec. 2, which is due to TshebyshefF. 

2. Let us consider n series each consisting of s independent trials with 
the constant probability p. Also, let 

mif m2, . . . mn 

represent the number of successes in each of these s series. Stochastic 
variables 


Xi = (mi - sp)2, X 2 = (m 2 - spy, • • • a;„ = (m„ ~ spy 

are independent and identical. Their common mathematical expecta¬ 
tion is spq. The law of large numbers can be applied to these variables 
and leads immediately to the conclusion: The probability of the inequality 


- sp)» 


< « 


can be brought as near as we please to 1 (or certainty) if the number of 
series n is sufficiently large. 
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Substituting €spq for c and dividing through by spg, we may state the 
same proposition as follows: The probability of the inequalities 

n 

- «p)* 

1 - « < -< 1 + t, 

Npq 

where N ^ ns la the total number of trials in all n series, can be brought 
as near to 1 as we please if the number of series is sufficiently large. 

The law of large numbers can be legitimately applied to the variables 

Xi = lm< - sp\; i = 1, 2, 3, . . . 

with the common mathematical expectation 

M. = 2«pgC5z}p^-V“M 

where m = [sp + 1], and leads to the following proposition: The proba¬ 
bility of the inequalities 

n 

^|tm - «p| 

can be brought as near to 1 as we please if the number of iSeries is suf¬ 
ficiently large. 

For the sake of simplicity, let us use the notations 

n 

- sp)* 

- 

n , 

n 

- «p| 

B « - 

n 

The probabilities P and P' of the inequalities 

(1) V^(l - <r) < A < V^{1 + fr) 

(2) - <r) < P < M.(l + <r) 

which are equivalent to 

n 

(1 - <r)« < - < (1 + «r)* 

' ' napq ' ' 

n 

Xl*»< - •iH 
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can both be made greater than 1 — ??, where -q is an arbitrarily small 
positive number. The probability of 8 imultane 9 us materialization of 
(1) and (2) is not less than 

P + P' - 1 > 1 - 2t;. 

But whenever (1) and (2) hold simultaneously, we have 

(o\ V W 1 - g ^ A s/spq 1 + g 

M. 1 + g B ^ Af, 1 - g' 

Therefore the probability of these inequalities is again >1 — 21?. Now 
let us take 


2 + r 

where r is another positive number arbitrarily chosen. Then 
1 + (T - , 1 — <r . . 


= 1 + r; 


> 1 — T. 


Hence, the inequalities 




+ t) 


follow from inequalities (3) and their probability is a fortiori >1 — 27;. 
It suffices to take 

Vspq 

to arrive at the following proposition: 

The probability of the inequality 

A _ Vm 
B M.^ 

for a fixed e and sufficiently large number of series can be made as near to 
1 as we please. 

If spq is somewhat large, the quotient 

M. 

differs but little from y/trl'l (see Chap. IX, Prob. 2, page 177). Hence, 
when the number of series is large and the series themselves sufficiently 
long, we may expect with great probability that the quotient 

A 

B 


will not differ much from ViT^. 
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Divergence Coefficient 

3. The considerations of the preceding section can be generalized. 
Let us consider again n series containing a trials each, and let 

mi, m2, . . . mn 

represent the numbers of successes in each of these series. Without 
specifying the nature of the trials (which can be independent or depend¬ 
ent) we shall denote by p the mean probability in all N = ns trials and 
by g = 1 — p its complement. Again considering the quotient 

n 

^(nii - sp)* 

we seek its mathematical expectation 

E{Q) = D. 

When all the N trials are of the Bemoullian type, D = 1. But it is also 
possible to imagine cases when D > 1 or D < 1. Lexis calls \/B the 
“coefficient of dispersion.” We shall call D itself the “theoretical 
divergence coefficient.” If mi, m 2 , . . . mn are actually observed fre¬ 
quencies in n series, the quotient 


D' = 


n 


- sp)* 


Npq 


may be called “empirical divergence coefficient.” Then, if the law of 
large numbers can be applied to variables 


_ (mi - spy. 

J/i — , 

spq 


^ = 1, 2, 3, 


we can expect with probability, approaching certainty as near as we please, 
that the inequality 

\D' - D| < € 


will be fulfilled for an adequately large number of series. 

Thus far we have not specified the nature of the trials. Now we shall 
suppose that all JNT == ns trials, distributed in n series, are independent 
but with probabilities varying in general from trial to trial. Let 


Pii, Piif • • • Pti a = 1, 2, . . . n) 
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be the probabilities in successive trials of the ith series. Their mean 

_ _ Pi< + + * * * + 

Pi - 

is the mean probability in the ith series. Finally 

_ Pi + P2 + • ‘ + Pn 

^ n 

is the mean probability in all N == ns trials. As to the expectation of 
(m< — sp)*, we find 

Einii - spy = E{mi - spi + s(p< - p)y = E(mi - sp.y + s\pi - p)* 

since 


On the other hand, 


E(ini — spi) - 0 . 


E(,mi - spi)^ = = spi - Xpfi 






and 


whence 


X(P« - Pf<)* = -sp’ + Xp«. 


y-i 




E(Tni - sp<)* = spi - apf - (p< - p,<)*. 




Now, letting t take values 1, 2, ... n and taking the sum of the 
results, we get 


X^imi - sp.)* = nsp - sXp^i - X ~ 


»-i 


But 


~ -nsp^ + s^p} 


t-i 




whence finally 


Z) = 1 4 


s - 1 
npq 




»-i >-i y-i 

Two particular cases deserve special attention. 
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Lexis’ Case. Probabilities remain the same within each series, 
but vary from series to series. In this case pa = and the expression of 
D becomes: 

» 

The theoretical divergence coefficient in this case is always greater than 
1 and may be arbitrarily large. 

Poisson’s Case. The probabilities of the corresponding trials in all 
series are the same, so that 

Vii = 

and 

Ti + Tj + • • • + X# 

Vi - - - 

In this case the divergence coefficient 

X (p - »<)* 

D = 1 - - 

m 

is always less than 1. 

Since the law of large numbers evidently is applicable to variables 

_ {mi - 8p)* 

Xi — ■ ■ ■■ ■ — f 

m 

we may expect that the empirical divergence coefficient D* will not 
differ much from D if the number of series is sufficiently large. 

For numerical illustration let us consider 100 series each containing 
100 trials, such that in 50 series the probability is % and in the remaining 
50 series it is %. Here we evidently have Lexis’ case. The mean 
probability in all trials is 

and 

100 

X(i-pO*-80-TH + 60 

FinaUy, 

D * 1 + II « 4.96. 

Now, suppose that we combine in pairs series of 100 trials with 
probability % and series of 100 trials with probability to form 50 
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series each of 200 trials. Evidently we have here Poisson’s case. The 
mean probability in each series again is p == and 

200 

X (i - = 100 • + 100 • = 2. 

• — 1 

Finally, 

^ = 1 - A = 0.96. 

The consideration of the divergence coefficient may be useful in 
testing the assumed independence of trials and values of probabilities 
attached to these trials. In the simplest case of Bernoullian trials with 
a constant and known probability, the theoretical divergence coefficient 
is 1. Now, if the number of series is sufficiently large and the empirical 
divergence coefficient turns out to be con.siderably different from 1, 
we must admit with great probability that the trials we deal with are not 
of the supposed type. If, however, the empirical divergence coefficient 
turns out to be near 1, that does not conclusively prove the hypothesis 
concerning the independence of trials and the ^assumed value of the 
probability. It only makes this hypothesis plausible. 

There are cases of dependent trials (complex chains considered by 
Markoff) in which the theoretical divergence coefficient is exactly 1 and 
the probability of an event has the same constant value in each trial, 
insofar as the results of other trials remain unknown. Cases like that 
may easily be mistaken for Bernoullian trials without further detailed 
study of the entire course of trials. 

4. When there is good reason to believe that the trials are independent 
with a constant but unknown probability, we cannot in all rigor find the 
value of the empirical divergence coefficient 


D' 


5 ) (rrii - spy 




Npq 


to compare it with the theoretical divergence coefficient D = 1, since p 
remains unknown. 

But, relying on Bernoulli’s theorem, we can take the quotient 


where 


M 

N 


M — mi + + • • • + mn 


as an approximate value of p. By taking p = M/N in the preceding 
expression for D' we get another number 
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D" - . irl _ 

^ M{N - M) 

which in general is close to D\ However, considering wii, wii, . . . mn 
not as observed but as eventual numbers of successes in n series, the 
mathematical expectation of is different from 1. To avoid this 
difficulty, it is better to consider a slightly different quotient 

n 

n{N - 1 ) 2 (”*‘ - ®^) 

® “ (n - 1)M{N - M) 

For this quotient there exists a theorem discovered and proved for the 
first time by the eminent Russian statistician Tschuprow. 

Theorem. The mathematical expectation of Q is rigorously equal to 1.^ 
Proof. Here we shall develop the proof given by Markoff. The 
above given expression of Q presents itself in the form % and therefore 
has no meaning in two cases: M = 0 or M = N. For these exceptional 
cases we set Q = 1 by definition. If neither Af =* 0 nor Af = iV, we 
can present Q in the form 


2 tn« - 


(4) 


<2 = 


n(N - 1) 


n - 1 M{N - M) 


Considering mi, ms, ... m^ as stochastic variables assuming integral 
values from 0 to s, the probability of a definite system of values 


is 


P = 


mi, ms, 
si 


mn 


fivnya »»•'!/* mn) • 

To get the expectation of Q we must multiply it by P and take the 
sum 

E{Q) « SPQ 


extended over all non-negative integers mi, ms, . . . mn, each of them 
not exceeding s. To perform this multiple summation we first collect 
all terms with a given sum 


mi + ms + • • • + mn = Af. 

^ The theorem itself and its proof given by Markoff can be extended to the case of 
series of unequal length. 
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Let the result of this summation be Then it remains to take the 
sum 

N 

Af-O 

to have the desired expression E{Q), To this end we first separate two 
terms corresponding to Af = 0 and M = N. In the former case 

mi = wij = • • * = mn = 0 

and the probability of such an event is while Q = 1. In the latter 
case 

mi = m2 = • • • = mn = s 

the probability of which is while again (2 = 1. Thus 

N~l 

^iQ) = p^ -h 4- 5) Sm. 

jir-i 

To find Su we observe that the denominator oiQ has a constant value 
when summation is performed over variable integers mi, m 2 , . . . rrin 
connected by the relation 

mi + m2 + * • • + mn = M. 

Hence, it suflices to find two sums 

SP and 2Pm? 

extended over integers mi, m 2 , . . . m„ varying within limits 0 and s 
and having the sum M, To this end consider the function 

V = (p(e^* + -f g)' * * • + qY 

involving n + 1 arbitrary variables f, {i, { 2 , . . . {«• When developed, 
V consists of terms of the form 
Pt «l+fnrf* • • • 

Evidently we obtain the sum 2Pby setting fi = {2 = * * * = fn = 0 
and taking the coefficient of in the expansion 

-{.-0 = ipt + q)^. 

Thus 

“ M\{N - 

To find ZPmJ take the second derivative 

a’V 
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and after setting = f j = • • • = fn = 0, expand 


-^-0 

and take the coefl&cient of Thus we find 


(6) 2^”** "• 

Referring to (4), (5), and (6), we easily get 

o n{N - 1) _ (N-2)\N 

“ (n - l)M(N - M) ‘ n(M - 1)KN - ^ 

+ (iSr ~ n)(M - 1) - M{N - l)]p" 5 ^-"; 

or, after obvious simplifications, 

fiu =s __ >nMnN-M 

M\{N - M)f ^ • 

Hence 

N-l 

2^ S« = (p + g)" - P" - ?" = 1 - P" - 

ilf-1 


and finally 

^(0 = 1. 

Markoff, using the same method, succeeded in finding the explicit 
expression of the expectation 


E{Q - 1)2. 


Since there is no difficulty in finding this expression except for some¬ 
what tedious calculations, we give it here without entering into details 
of the proof: 


E{Q - 1)2 = 


2N{N - n) 


AT-l 


(n - \){N - 2)(iSr - 3) 


-3i2 


Af-l 


M-l N-M-\ 
M ‘ N - M 


C^p^qx 


whence the following inequality immediately follows: 


E{Q - 1)» < 


2N{N - n) 

(n - \){N - 2)(N - 3)’ 


In case n ^ 5 a still simpler inequality holds: 

(7) E(Q - 1)» < 

TO — 1 




Sec. 5] APPLICATIONS OF THE LAW OF LARGE NUMBERS 


219 


Let R be the probability of the inequality 

Q ^ 1 + 

where € is a positive number. Applying the same reasoning to inequality 
(7) as was used in establishing Tshebysheff's lemma, we find that 

^ < 7- 

(n — 1)€2 

Likewise, denoting by R* the probability of the inequality 

0 ^ 1 -*, 

we have 

Thus, in a large number of series it becomes very unlikely that ttie 
value of Q found in actual experiment would lie outside of the interval 
1 — €, 1 + €. For instance, the probability for Q ^ 2 in 100 series is 
surely less than 

99 

or nearly 0.02. However, this limit is much too high. It would be 
greatly desirable to have a good approximate expression for the proba¬ 
bility of either one of the inequalities 

Q ^ 1 + € or Q ^ 1 - e. 

But this important and difficult problem has not yet been solved. 

6. In order to illustrate the foregoing theoretical considerations we 
turn to experiments reported by Charlier in his book “Vorlesungen 
fiber die Grundzfige der mathematischen Statistik’^ (Lund, 1920). He 
made 10,000 drawings of single cards from a complete deck of 52 cards 
(each card taken being returned before the next drawing), and noted 
the frequency of black cards. The drawings were divided into 1,000 
series of 10 cards, or into 200 series of 50 cards. The results are given 
in the tables on page 220. 

Assuming the independence of trials and the constant probability 
p = 3^, the theoretical divergence coefficient must be 1. Let us compare 
it with the empirical divergence coefficient derived from Tables I and II. 
To this end we multiply the squares of numbers in the second column 
by the numbers given in the third column. The results are: 

For 200 series of 50 cards 
Xirm - pa)* » 2,487 


For 1.000 series of 10 cards 
- pa)* « 2.419 
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Table I.— Nuuber of Black Cards in 
200 Groups of 50 Cards Each 


Frequency 

Difference 
tn — 25 

Number of 
groups with 
these 

frequencies 


-11 


BH 

-10 



- 9 


17 

- 8 


18 

- 7 


19 

- 6 

8 

20 


6 

21 


15 

22 


13 

23 


15 

24 


34 

25 

0 

14 

26 

1 

21 

27 


26 

28 


14 

29 


10 

30 


5 

31 


5 

32 


3 

33 

8 

2 


Table II.— Number of Black Cards in 
1,000 Groups of 10 Cards Each 


Frequency 

Difference 
m — 5 

Number of 
groups with 
these 

frequencies 

0 

-5 

3 

1 

-4 

10 


-3 

43 


-2 

116 


-1 

221 

5 

0 

247 

6 

1 

202 

7 


116 

8 


34 

9 


9 

10 

5 

0 


Dividing these numbers by 10,000 • = 2,500, we get the following 

empirical divergence coefficients: 

D' = 0.9948; Z>" = 0.9676. 

Both are close to 1, so that the hypotheses of independence of trials 
and constant probability for each of them, are in good agreement with 
empirical results. The second divergence coefficient, corresponding to 
more numerous groups, differs from 1 more than the first, correspK)nding 
to only 200 groups. But such a difference can be accounted for by 
fluctuations due to chance. 

Series of 50 trials are long enough to test the theorem established in 
Sec. 2 of this chapter. The quantities denoted there by A and B are 
here correspondingly: 

= A = 3.5263 

B = 4H; B = 2.805 

whence 

g = 1.2571 
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while 

= 1.2633. 

Again the difference, only about 4.10"*, is rather small. 

In this example, the probability of drawing a black card was assumed 
to be In case we do not know the probability, but suppose it to be 
constant throughout 10,000 independent trials, we must consider the 
coefficient 


^ n(N - 1) ^ MV 

^ “ (n - 1)M(N - M)^\ ‘ V / ■ 


In our example 


n = 1,000; N = 10,000; M = 4,933 


To evaluate the sum 


« = 10; «^ = 4.933. 


1,000 

S = 2) (mi - 4.933)» 


we write it in the form 


1,000 1,000 

S = 2^ (m< - 5)* + 0.134 2) (mi - 6) + 1,000 • (0.067)». 

»-l *-l 


Now 


1,000 


X (mi - 5)* = 2,419 


1,000 • (0.067)» = 
1,000 



4.489 

-8.978 


S = 2,414.51 

This is to be multiplied by the number 

n(N - 1) _ 1 

(n - 1)M(N - M) 2497.3’ 


The result is 


0.9668, 

<• 

near enough to 1 for us to consider the hypothesis of independence of 
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trials and the constant value of probability as in agreement with experi¬ 
mental data. 


Examples op Dependent Trials 

6. So far we have dealt only with independent variables. But the 
law of large numbers holds, under certain conditions, even in the case of 
dependent variables. Leaving aside generalities, we shall show the appli¬ 
cation of the law of large numbers to a few interesting problems involving 
dependent variables. 

Let us consider first a Bernoullian series consisting of n + 1 inde¬ 
pendent trials with the same probability p for an event E, the opposite 
event being denoted by F. We associate with trials 1, 2, ... n variables 
Xif Xa, , . . Xn defined as follows: 

Xi = I 'll E occurs in trials i and i + 1, 
x< = 0 in all other cases. 

The probability of x< = 1 evidently is p* when nothing is known about 
the values of other variables. But if we know that = 1, which 
implies the occurrence of E in the ith trial, then the probability of = 1 
is p. Thus, consecutive variables are dependent. However, x* and x* 
are independent if |A; — t| > 1, as we can easily see. Since 

E{xi) = E(x\) = p2 • 1 -h (1 — p*) * 0 = p* 

the expectation of the sum xi + x^-h ‘ * * + Xn will be 

E{xi + X 2 + • • • + Xn) = np2. 

As to the dispersion of this sum, it can be expressed as follows: 

n 

Bn = XEixi - pV + 2XE(xi - p^Kxi - p»). 

*-l 1>» 

Now 

(8) E{xi - p^y = E{x\) - 2p*E(x<) -h p^ = p*(l - P*) 
and 

(9) E{Xi — p*)(Xy — p*) = E{Xi — p*) • E(Xy — p*) = 0 
for j > i + 1 because then x,- and x/ are independent. But 

(10) Eixi - p*)(xi+i - p*) = E(xai+i) - p* = p^ - p* 

since the probability of simultaneous events 

Xi = 1 , Xi+i = 1 
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is p*. Taking into account (8), (9), and (10), we find 

Bn = np^q(Sp + 1) — 2p*g 

and the condition 

Bn A 

as n —► 00 

n® 

is satisfied. Hence, the law of large numbers holds for variables Xi, 
X 2 , . . . Xn. To express it in the simplest form, it suflSces to notice that 
the sum 

Xi + X2 + * * • + Xn 

represents the number of pairs EE occurring in consecutive trials of the 
Bernoullian series of n + 1 trials. Let us denote the frequency of such 
pairs by m. Then, referring to the law of large numbers, we get the 
following proposition : 

If in n consecutive pairs of Bernoullian trials the frequency of double 
successes EE is m, then the probability of the inequality 


will approach 1 as near as we please^ when n becomes sufficiently large. 

7. Simple chains of trials, described in Chap. V, Sec. 1, offer a good 
example of dependent trials to which the law of large numbers can be 
applied. Let pi be the given probability of an event E in the first trial. 
According to the definition of a simple chain, the probability of E in 
any subsequent trial is a or 0 according as E occurred or failed to occur 
in the preceding trial. By p„ we denote the probability for E to occur 
in the nth trial when the results of other trials are unknown. Let 

p = 

Then, according to the developments in Chap. V, Sec. 2, 


p» = p + (Pi - p)**~S 


whence 


Pi + Ps + • • • + p» _ , Pi — p 1 - «» 

--P + --- - j_ 

barring the trivial cases 5=lor5=—1. It follows that p represents 
the limit of the mean probability in n trials when n increases indefinitely, 
and for that reason p may be called the mean probability in an infinite 
chain of trials; When it is known that E has occurred in the tth trial, its 
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probability of occurring in some subsequent jth trial is given by 
Vf = P + g = 1 - p. 

In the usual way we associate with trials 2, 3, . . . variables 
x\, X 2 f xi, . . . so that in general 

= 1 when E occurs in the ith trial 
Xi = 0 when E fails to occur in the tth trial. 

Evidently 

E{x,) = E{2^,) = p,. 

In order to prove that the law of large numbers can be applied to 
variables Xi, X 2 , xs, . . . , we must have an idea of the behavior of Bn 
for large n. By definition 

n 

B,. = E{xi — Pi + xt — pt + ■ • • + a;, — p,)* = '^E{xi — p*)’ + 

<-i 

+ 2%E[{xi - pi)(xi - Pi)]. 
j>i 

The first sum can easily be found. We have 
E{Xi - pO* = Pi - pj = pg + (g - p)(pi - p)«*~^ - (pi - p)*5*‘~* 
whence 

n 

A = — Pi)* ~ npq 

» —1 

neglecting terms which remain bounded. As to the second sum, we 
observe first that 

E{xi - pi)(xi - Pi) = E{xiXi) - p,pi. 

Again, since the probability of 

XiXi = 1 

is evidently p^^p we have 

E{xiXi) = 

and 

E(Xi - pi)(xi - Pi) = pi^pf - Pi) = pqS>-^ + 

+ (pi “ p)(g - p)«''"‘ - (pi - 

Now, for a fixed i = 1, 2, ... n — 1, we must take the sum of these 
expressions letting j run over t + 1, t + 2, . . . n. The result of this 
summation is 


pq— -- + (pj - p)(q - p)^ -- - (p^ - p)!^.- 


1 - a 


1 - a 


1 - a 



Sac. 8] APPLICATIONS OF THE LAW OF LARGE NUMBERS 


225 


Taking i 2, 3, . . . n — 1 and neglecting in the sum the terms 
which remain bounded, we get 


whence 


B - '^Eixi - pi)(.Xi - Pi) ~ 

i>i 



Bn 


A 2B 


npq 


1 + B 
1 - B 


This asymptotic equality suffices to show that 
^ > 0 as n 


Therefore the law of large numbers can be applied to variables Xi, 
Xj, X 3 , . . . . Since the sum 

xi-\- X2+ * •/ + Xn = m 

represents the frequency of ^ in n trials, the law of large numbers in 
this particular case can be stated as follows: For a fixed < > 0, no matter 
how small, the probability of the inequality 


m 

n 


Pi + Pi + • • 

• +Pn 

n 



tends to 1 as n —> 00 . 
The arithmetic mean 


Pi 4- P2 -f • ’ • + Pn 
n 


itself approaches the limit p. It is easy then to express the preceding 
theorem thus: The probability of the inequality 


m 

n 


P 


< c 


tends /o 1 os n —> 00 . 

This proposition is of exactly the same type as Bernoulli's theorem, 
but applies to series of dependent trials. 

8. Let a simple chain of = n5 trials be divided into n consecutive 
series each consisting of 5 trials; also, let mi, m 2 , . . . mn be the fre¬ 
quencies of E in each of these series. When iV is a large number, the 
mean probability in N trials differs little from the quantity denoted by p. 
It is natural to modify the definition of the divergence coefficient given 
in Sec. 3 by taking p instead of the variable mean probability in N trials, 
Thus we define 
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^ (m, - »p)* 

D = 


Npq 

In our case, the variables 

Xi = (mi - spy, Xi = (m2 - spy, 


• Xn = (mn - spy 


are neither identical nor independent, although the degree of dependence 
is evidently very slight. These variables can also be presented in the 
form 


(11) {Xa - P + Xa+l - p 4 . . . . -f Xa+,-1 “ p)* 

taking successively a = 1, s + 1, 2s + 1, . . . (n — l)s + 1. 

To find the mathematical expectation of (11) it suffices to notice that 

E(zi - py = E(xi - Piy + (pi - py = pg + (g - p)(pi - p)5‘'-i 

E{Xi - p){Xi - p) = E(Xi - Pi)(Xi - p,) + (pi - p)(py - p) 

= pg5»-‘ + (pi - p){q - p)^-^ 

and then proceed exactly as in the approximate evaluation of Bn in Sec. 7. 
The final result is 


E{xa — p + iCo+l — p + • • 
_1 + 5 2pqb 


-f Xa+.-l ~ p)® = 


= _+ (g - P)(P. - P)(l + i) t , 

- 5 (1 - ^ (1 - «)» ^ 

+ - »>>»;- »’) |2,(l -,) + ! + .IS.*--. 


For somewhat large s the two last terms in the right member are com¬ 
pletely negligible; so is the third term if a ^ s + 1. Hence, with a good 
approximation. 


E{X0 = 
E{X,) = 


_2pqS_ 
(1 - «)* 
2pq8 

(1 - sr 


(q - p)(pi - p)(l + {) 

(1 - «)* 


if i > 1 


and 


^ _ 1 + 5 25 . (g ~ p)(pi - p)(l + 5) 

^ “ 1 - 5 s(l - 5)* Arpg(l ~ 6y 

Again, when N is large, the last term can be dropped and as a good 
approximation to D we can take 


( 12 ) 


1 + 5 25 

1-5 s(l - 5)*‘ 


It can be shown that the law of large numbers holds for variables Xi, 
Xi, , , , Xn and therefore when n (or the number of series) is large, the 
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empirical divergence coefficient is not likely to differ considerably from 
D as given by the above approximate formula. 

9. In order to see how far the theory of simple chains agrees with 
actual experiments, the author of this book himself has done extensive 
experimental work. To form a chain of trials, one can take two sets of 
cards containing red and black cards in different proportions, and 
proceed to draw one card at a time (returning it to the pack in which it 
belongs after each drawing) according to the following rules: At the 
outset one card is taken from a pack which we shall call the first set; 
then, whenever a red card is drawn, the next card is taken from the first 
set; but after a black card, the next one is taken from the second set. 
Evidently, these rules completely determine a series of trials possessing 
properties of a simple chain. In the first experiment the first pack 
contained 10 red and 10 black cards, while the second pack contained 5 
red and 15 black cards. Altogether, 10,000 drawings were made, and 
following their natural order, they were divided into 400 series of 25 
drawings each. The results are given in Table III. 

Table III.— Distribution of Red Cards in 400 Series op 25 Cabdb 


Frequency of 
red cards, m 

Difference, 
m - 8 

Number of series 
with these frequencies 

1 

-7 

2 

2 

-6 

4 

3 

-5 

8 

4 

-4 

27 

5 

-3 

29 

6 

-2 

54 

7 

-1 

37 

8 

0 

52 

9 

1 

47 

10 

2 

44 

11 

3 

41 

12 

4 

20 

13 

5 

20 

14 

6 

7 

15 

7 

4 

16 

8 

3 

17 

9 

1 


The sum of the numbers in column 3 is 400, as it should be. Taking 
the sum of the products of numbers in columns 1 and 3, we get 3,323, which 
is the total number of red cards. The relative frequency of red cards in 
10,000 trials is, therefore. 


0.3323. 
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In our case 

« = i. ^ = i» « = i 

and the mean probability p in an infinite series of trials 
p - - 1 - 0.3333. 

Thus, the relative frequency observed differs from p only by 10~* and 
in this respect the agreement between theory and experiment is very 
satisfactory. Now let us consider the theoretical divergence coefficient 
for which we have the approximate expression 

^ _ 1 + 5 25 

1-5 5(1 - 5)2* 

Here we must substitute 5 = 34 and s = 25. The result is 
D = 1.631, approximately. 

To find the empirical divergence coefficient we must first evaluate the 
sum 

5 = S(m - ¥)' 

extended over all 400 series. For the sake of easier calculation, we 
present S thus: 

>S = S(w - 8)2 - §S(m - 8) + 

Now from Table III we get 

X{m - 8)2 = 3,521; Dfm - 8) = 123 

whence 

S = 3,483.4. 

Dividing this number by 2000 ^^ = 2,222.2, we find the empirical 
divergence coefficient 

D' = 1.568 

which differs from D = 1.631 by only about 0.06, well within reasonable 
limits. 

10. In two other experiments two packs were used: one containing 
13 red and 7 black cards, and another 7 red and 13 black cards. In 
one experiment the pack with 13 red cards was considered as the first 
deck, and in the other experiment it became the second deck. The 
new experiments were conducted in the same way as that described in 
Sec. 9, but they were both carried to 20,000 trials divided into 1,000 
series of 20 trials each. In the first experiment, we have 

« =* ^ ~ ^ ~ P = i 

and 

D = 1.796, approximately. 
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while the same quantities for the second experiment are 

a = At = ill ^ = “tVi V - i 
and 

D = 0.556, approximately. 

The results of these experiments are recorded in the following two 
tables: 


Table IV.— Concerning the First Experiment 


Frequency of 
red cards, m 

Difference, 
m - 10 

Number of series 
with these frequencies 

2 

-8 

3 

3 

-7 

5 

4 

-6 

18 

5 

-5 

36 

6 

-4 

59 

7 

-3 

93 

8 

-2 

103 

9 

-1 

117 

10 

0 

128 

11 

1 

121 

12 

2 

101 

13 

3 

93 

14 

4 

48 

15 

5 

39 

16 

6 

26 

17 

7 

7 

18 

8 

1 

19 

9 

1 

20 

10 

1 


Table V.— Concerning the Second Experiment 


Frequency of 
red cards, m 

Difference, 
m - 10 

Number of series 
with these frequencies 

5 

-5 

2 

6 

-4 

10 

7 

-3 

48 

8 

~2 

112 

9 

-1 

193 

10 

0 

251 

11 

1 

201 

12 

2 

113 

13 

3 

56 

14 

4 

9 

15 

5 

5 
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Taking the sum of the products of numbers in columns 1 and 3, we 
find 

10,036 and 10,045 

as the total number of red cards in the first and second experiments. 
Dividing these numbers by 20,000, we have the following relative 
frequencies of red cards: 

0.50018 and 0.500225 

extremely near to p = 0.5. From the first table we find that 
S(m - 10)2 = g^924 

summation being extended over all 1,000 series. Dividing this number 
by 20,000 • 3^ = 5,000, we find the empirical divergence coefficient in 
the first experiment 

D' = 1.785 

which comes close to 

D = 1.796. 

Likewise, from the second table we find 

2(m - 10)2 = 2,709, 
whence, dividing by 5,000, 

D" = 0.5418 

again close to 

D = 0.5562. 

Thus, all the essential circumstances foreseen theoretically, for simple 
chains of trials, are in excellent agreement with our experiments. 


Problems for Solution 

1 . From an um originally containing a white and 6 black balls, n balls arc drawn 
in succession, each ball drawn being replaced by 1 + c(c > 0) balls of the same color 
before the next drawing. If m is the frequency of white balls, show that the prob¬ 
ability of the inequality 

m a 

-TT * 

n a + h 


does not tend to 1 as n increases indefinitely (Markoff, G. P61ya). 

Indication of the Proof. If Xi * 1 or =0, according as a white or a black ball 
appears in the ith drawing, we have 


E{Xi) « E{x]) 


a 4- 6 


EixiXi) 


a -f c 


a b o-fb-f-c 
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Hence 


Bn s* JErl aJi “h Xj + 


■f acii - 


na 

n + 6, 


Y ^^ 

) “(o+6)t(a 


n*abc 


+ h+c) 

+ 


nab 


(a + 6)(a + h + c) 


2 . Marhe'a Problem. A group of exactly m uninterrupted successes or failures F 
in a Bernoullian series of trials with the probability p for a success is called an ‘‘m 
sequence.” If N is the frequency of m sequences in n trials, show that the probability 
of the inequality 


N 

-+ p*g~)| 

n 


< « 


for a fixed e converges to 1 as n becomes infinite. 

Indication of the Proof. Associate with each of the n -- m I- 1 first trials 
variables Xi, Xs, . . . x^ assuming only two values, 0 and 1. For 1 < t < we set 
Xi == 1 if, beginning with the ith trial, a succession of m letters ^ or F is preceded and 
followed by F or E. In all other cases x* - 0. We set Xi = 1 if, beginning with the 
first trial, there is a succession of m letters E or F ended by F or E, otherwise xi = 0. 
Finally, x^* == 1 if, beginning with the fiih trial there is a succession of m letters EorF 
preceded by F or E, otherwise x^ = 0. Show that 


E{xi 4- X 2 4* • • • 4- Xp) = (n — m — l)(p’"g* 4- p*g"*) 4- 2(p’*g 4- pg"*) 
E{xi 4" xj -f • • • 4" Xft)* = n*(p”*q* 4- p^q"^)* 4- wJP 


where P remains bounded. 

3 . The following interesting series of dependent trials has been suggested by S. 
Bernstein: Two urns contain white and black balls. The probabilities of drawing 
white balls from the first and second urns are, respectively, p and p*. The probabilities 
of drawing black balls from the same urns are g = 1 — p and g' = 1 — p'. Finally, 
the probability of taking a ball from the first urn at the outset of the trials is a. A 
series of trials is uniquely defined by the following rule: Whenever a white ball is 
drawn (and returned), the next ball is drawn from the same um; but when a black 
ball is drawn, the next ball is taken from the other um. Let On be the probability 
that the nth ball will be drawn from the first urn when the results of other drawings 
remain unknown. Under the same assumption, let p» be the probability of the nth 
ball being white. Find general expressions of an and p». 

Hint: 

a»+i = OnCp 4" p' — 1) 4" 1 “ p' 

whence 


Also 

whence 


an 



2 



(p + p' - !)-->. 


Pn = OnP 4* (I — an)p' 


Pn = 


p -\-p' - 2pp' 
2 - p - p' 


+ 



1 

2 - 



(P - P')(P' 4- P - 1)""*. 


4 . When it becomes known that in the tth trial a white ball was drawn, what are 
the probabilities a^’^ and pj*^ of taking a ball from the first urn in the jthC; > i) trial 
and of drawing a white ball in the same trial? 
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Hint: The probability that it was the first um from which a white ball was 
drawn in the tth trial is determined by Bayes’ formula: 


For n ^ i + 1 


whence 




-a 




OtiP 

Pi 


+ -1) + 1 - p' 


• 1 - p' J /wp 
2 - p - P' ^ \ pj 


2 



(p + p' - I)*-*-* 


for j > t H- 1. Furthermore 

p<>> = aj‘)p + (1 - o5‘>)p' 

for; ^ t 4- 1. 

5. From now on we shall assume p + p' = 1 or p' — g, g' = p. Show that the 
law of large numbers can be applied to variables Xi, Xij Xa, . . . which are defined in 
the usual way: 


Xi = 1 if a white ball is drawn in the tth trial, 
Xi = 0 if a black ball is drawn in the tth trial. 


Indicaiion of the Proof. Evidently E{xi) = E{xl) — p,-. Furthermore 


Bn, ~ ^B(xi — Pi)* + 2^E(Xi — pi){Xi — Pi). 

* -1 j>i 

Now 

E{Xi - p,)* = 2pg(l - 2pg); {> I 

E{xi - pi)* = pg + a(l - a)(p — g)*. 

For j > t > 1 

E{xi — pi){xi — p,) = 0 if i > t -f 1 
E{Xi - p»)(xt+i - p.+i) = pg(l - 4pg). 

For t - 1 and j > 1 

E{xi - pi)(xi - p,) = 0 if i > 2 
E{xi - pi)(Xa - Pa) = ap* + (1 - «)g* - (1 - 2pg)(g + (p ~ g)o). 

Hence 

Bn ^ 4pg(l — 3pg)n 

and the law of large numbers holds. It can be stated as follows: If in n trials the 
frequency of white balls is m, then the probability of the inequality 


— - (P* + g*) 
n 


< € 


tends to 1 as n tends to infinity for any given positive number c. 

6. Let r - p* + q* he the mean probability in infinitely many trials, 
divergence coefficient 


D 


^ (m, - ar)* 


Find the 


when N ^ na trials are divided in n consecutive groups containing a trials each. 
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’ Indication of Solution. From the foregoing formulas it follows that 
E{Xa - r -f Xa+i - r + • • • + Xa+.-i “ r)» - 4 «p9(1 - Zpq) ~ 2pq{l - 4pq) 
if a > 1. Hence 


E^ {mi - «r)* = ANpq{l - Zpq) - 4«pg(l - Zpq) - 2(n - \)pq{l - 4pg). 

t-2 


Again 

Jg?(mi — «r)* = 
so that finally 


4i8pq{l — Zpq) — 2p9(3 — lOp^) + p(l — 6g -f 12g* — 4g») — 

- «(?> - ?)(! - 8/)g) 


^ 2 - 6pg _ 1 -- 4pg (p - g)(p - a)(l - 8p^) 

1 - 2pq s(l — 2pq) 2Npq{\ - 2pq) 

For large N with a good approximation 


D = ^ ~ ^ 

1 — 2pq s(l — 2pq) 

7 . Two sets of cards containing respectively 12 red and 4 black cards (the first 
deck) and 4 red and 12 black cards (the second deck) were used in the following experi¬ 
ment: The first card was taken from the first deck; and in the following trials, after 
a red card the next one was taken from the same deck, but after a black one the next 
card was taken from the other deck. Altogether 25,CX)0 cards were drawn, and in their 
natural order were divided in 1,000 series of 25 cards each. The results are recorded 
in Table VI. How close is the agreement between this experiment and the theory? 


Table VI.— Distribution of Red Cards in 1,000 Series of 25 Cards 


Frequency of 
red cards, m 

Difference, 
m — 16 

Number of series 
with these frequencies 

6 

-10 

1 

7 

- 9 

1 

8 

- 8 

1 

9 

- 7 

12 

10 

- 6 

13 

11 

- 5 

43 

12 

- 4 

65 

13 

- 3 

92 

14 

- 2 

101 

15 

- 1 

162 

16 

0 

94 

17 

1 

164 

18 

2 

68 

19 

3 

no 

20 

4 

26 

21 

5 

28 

22 

6 

10 

23 

7 

7 

24 

8 

1 

25 

9 

1 
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Aw. In the present case p = g' *= p' *= 5 = Mean probability in infinitely 
many trials: 

pt + * I « 0.625. 

Theoretical divergence coefficient: D - 1.384. Frequency of red cards: 15,696. 
Relative frequency: 

mu * 0.62784, 

close to 0.625. 

Empirical divergence coefficient: />' *= 1.3845, very close to 1.384. 

The probability of taking a card from the second deck is 0.25. Now, by actual 
counting, it was found that in 7,500 trials a card was taken from the second deck 
1,856 times. Hence, the relative frequency of this event in 7,500 trials is 

IIM « 0.2475, 

again very close to 0.25. 
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CHAPTER XII 


PROBABILITIES IN CONTINUUM' 

1. In the preceding parts of this book, whenever we dealt with 
stochastic variables, it was understood that their range of variation was 
represented by a finite set of numbers. Although, for the sake of better 
understanding of the subject, it was natural to begin with this simplest 
case, there are many reasons why it is necessary to introduce into the 
calculus of probability stochastic variables with infinitely many values. 
Such variables present themselves naturally in many cases of the type of 
Buffon’s needle problem which we had occasion to mention in Chap. VI. 

On the other hand, even in dealing with stochastic variables with a 
finite, but very large number of values, it is often profitable for the sake 
of approximate evaluations, to substitute for them fictitious variables 
with infinitely many values. Among these the most important ones by 
far are continuous variables. 

Case of One Variable 

2. Beginning with the case of a single continuous variable x, we must 
assume that its range of variation is known and represented by a given 
interval (a, 6), finite or infinite. The knowledge only of the range of 
variation of x would not enable us to consider x as a stochastic variable; 
to be able to do so, we must introduce in some form or other the considera¬ 
tions of probability. For a continuous variable it is as unnatural to 
speak of the probability of any selected single value, as it is to speak of 
the dimension of a single selected point on a line. But just as we speak 
of the length of a segment of a line, we may introduce the notion of the 
probability that x will be confined to a given interval (c, d), part of (a, h). 

In introducing this new notion of probability in any manner whatso¬ 
ever, we must be careful not to fall into contradiction with the laws of 
probability which are assumed as fundamental. To this end, if P (c, d) 
is the probability for x to lie in the interval (c, d), we are led to assume 

V P(c, d) ^ 0 
2® P(a, b) = 1. 

The first assumption is an expression of the fact that probability 
can never be negative. The second assumption corresponds to the fact 
that X certainly assumes one out of the totality of its possible values. 

235 
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Next, if the interval (c, d) is divided into two adjoining intervals 
(c, e) and (e, d), we assume 

3®P(c, d) = P(c, «) + P(^, d) 

in conformity with the theorem of total probability. 

For continuous variables it is furthermore assumed: 4® for an infini¬ 
tesimal interval (c, d), P(c, d) is also infinitesimal. 

Properties 3° and 4® show that P(c, d) is a continuous function of c 
and d and that 

P(c, c) = 0. 

In other words, the probability that x will assume any given value is 0. 
At the same time P(c, d) represents the probability of any one of the four 
inequalities 

c < X < d] c ^ X < d] c < X ^ d; c ^ x ^ d, 

3. A simple example will serve to clarify these general considerations. 
A small ball of negligible dimensions is made to move on the rim of a 
circular disk. It is set in motion by a vehement impulse and after many 
complete revolutions, retarded by friction and the resistance of the air, 
comes to rest. The variety and complexity of causes influencing the 
motion of the ball make it impossible to foresee the final position of the 
ball when it comes to rest and the whole phenomenon bears characteristic 
features of a play of chance. The stochastic variable associated with this 
chance phenomenon is the distance from a certain definite point on the 
rim (origin) to the final position of the ball, counted in a definite direction, 
for example, clockwise. This variable, when we consider the ball as a 
mere point, may have any value between 0 and the length of the rim. 
The question now arises, how to define the probability that the ball will 
stop in a specified portion of the rim, or else that the variable we consider 
will have a value belonging to a definite interval, part of its total range 
of variation. In trying to define this probability, we must observe the 
fundamental requirements set forth in Sec. 2. Besides that, we must of 
necessity resort to considerations which are not mathematical in their 
nature but are based partly on aprioristic and partly on experimental 
grounds. Suppose we take two equal arcs on the rim. There is nothing 
perceptible a priori that would make the ball stop in one arc rather than 
in another. Besides, actual experiments show that the ball stops in one 
arc approximately the same number of times as in another, and this 
experimental knowledge together with aprioristic considerations suggests 
the assumption that we must attribute equal probabilities to equal arcs, 
irrespective of the position of the arcs on the rim. As soon as we agree on 
this assumption or h 3 rpothesis, the problem becomes mathematical and 
can easily be solved. 
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Before proceeding to the solution, a remark on the meaning of zero 
probability in connection with continuous variables is not out of place. 
Zero probability in this case does not mean logical impossibility. We 
attribute zero probability to the event that the ball will stop precisely 
at the origin. However, that possibility is not altogether excluded 
so far as we consider the origin and the ball as mere points. The question 
lacks sense if we deal with a material ball and a material rim, no matter 
how small the former and how fine the latter. 

4. A stochastic variable is said to have uniform distribution of 
probability if probabilities attached to two equal intervals are equal. 
This means that P(c, d) depends only upon the length d — c = s of the 
interval (c, d) and accordingly can be denoted simply by P(s), Com¬ 
bining two adjoining intervals of the respective lengths s and s' into a 
single interval of length a + s', according to requirement 3°, we must 
have 

(1) + «') = P(s) + P(s'). 

Suppose now that the interval (a, 6) of the length 6 — a = Z, represent¬ 
ing the whole range of variation of x, is divided into n equal intervals 
of the length l/n. The repeated application of equation (1) gives 

P(l) . npQ. 

But by requirement 2° P{1) = 1 and hence 



Again, repeated application of (1) gives 

K?) - f 

for any integer m < n. Now let us take any interval of length s. For an 
appropriate m it will contain the interval and be contained in the 

IW 1 

interval — - —Z; hence, referring to requirements 1® and 3®, we shall have 



n 
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or 

m ^ 8 ^ m + 1 
n I n 

Since P(s) and s/l are contained in the same interval of length 1/n, 

and this being true for an arbitrary n, no matter how large, it follows that 

PW = f 

Thus for a variable z with uniform distribution of probability, the 
probability of assuming a value belonging to an interval of length s is 
given by the ratio of s to the length I of the whole range of variation of x. 

6. In the general case, when we cannot assume the uniform distribu¬ 
tion of probability throughout the whole range of variation of x, we let 
ourselves be guided by an analogy with a mass distributed continuously 
over a line. In fact, the distribution of a mass satisfies all the require¬ 
ments set forth for probability. In particular, the mass Am contained 
in an infinitesimal interval ( 2 , z + Az) is also infinitesimal and the mean 
density 

Am 

Az 

is generally supposed to tend, with Az converging to 0, to a limit called 
“density at the point 2 .“ If this density p(z) is known, the mass con¬ 
tained in any interval (c, d) is represented by an integral 

j’j‘p(z)dz. 

Following this analogy we admit that the mean density of 'probability 

P{z, z + Az) 

Az 

tends U) a limit/(«): density of probability at the point z when the length 
of the interval Az tends to 0. Hence, again the probability corresponding 
to an interval (c, d) will be represented by the integral 

Pic, d) = ffmdz. 

This expression satisfies all the requirements of Sec. 2 if the density of 
the probability/(«) is subject to two conditions: 
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(a) 

f(z) ^ 0 for all 2 in (a, 6). 


ib) 

II 



The second condition implies^ of course, the existence of the integral itself. 
But in all cases of any importance the density is continuous, save for 
discontinuities of the simplest kind which do not cause any doubts as 
to the existence of the above integral. 

From the general expression of P(c, d) it follows that for an infini¬ 
tesimal interval ( 2 , 2 + dz) the probability is given by f{z)dz neglecting 
infinitesimals of a higher order. For the uniform distribution of proba¬ 
bility over an interval of length I the density is constant and = 1/L 
In other cases we cannot expect to obtain a definite expression for 
density unless the variable itself is sufficiently characterized by addi¬ 
tional conditions, either hypothetical or implied by the problem. Thus, 
for instance, in applications of probability to problems of theoretical 
physics, the physicists have succeeded in obtaining definite probability 
distributions by invoking physical laws of admitted universal validity 
together with some plausible hypotheses. 

6. The interval containing all possible values of a stochastic variable 
may be finite or infinite according to the nature of that variable. How¬ 
ever, in all cases we may take the largest possible interval from — 00 to 
+ 00 ; to this end it suffices to define the density outside of the originally 
given interval as being = 0. Then the density will be defined for all 
real values of 2 and will satisfy the conditions: 

(а) f{z) ^ 0 for ail 2 

( б ) = 1 

Furthermore, the probability for x to be in any interval (c, d) will be 
given by 

l%)dz. 

In particular, taking c = — <» and writing t instead of d. 

Fit) = Jiz)dz 

represents the probability that x will not exceed or will be less than t. 
Considered as a function of tj F(Jt) is never decreasing and varies between 
— 00 ) =0 and F(+ 00 ) = 1. It is called the “distribution function of 
probability.’^ In case x has uniform distribution of probability over an 
interval (a, 6) its distribution function is evidently defined as follows: 
F(t) =0 for t < a 

^(0 = r-- for a ^ t ^ h 

0 — a 

F(t) = 1 for t > h. 

Its graph is shown in Fig. 1 on page 240. 
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7. The definition of mathematical expectation can easily be extended 
to continuous variables; namely, the expectation of x or the mean value 
of X is defined by 

E{x) = 

provided this integral exists. Similarly, the mathematical expectation 
of any function is given by 

Of course, the existence of the integral in the right member is presupposed 
again. When this integral does not exist, it is meaningless to speak of 

the mathematical expectation of (p{x). 

_The mathematical expectation of the 

‘'®o a b +00 power x" with positive integer exponent 

is called the moment of the order n or 
nth moment. We shall denote it by m„ so that 

ni„ = 

The dispersion D and the standard deviation of x are defined in the same 
way as in Chap. IX; namely, 

/) = = E{x — mi)2 = “* rniyf{z)dz = m^ — m\. 

Often it is advisable to consider the mathematical expectation of |x|“ 
where a may be any real number, ordinarily positive. This expectation 
is called the ‘‘absolute moment of the order Its expression is 

and it is evident that 

m2k — M2*; |ni2ife+i| ^ 

The mathematical expectation of the function 

eitx 

where t is a, real variable, is of the utmost importance. It is called the 
“characteristic function^' of distribution and is defined by 

^(0 = J^^e***f{z)dz. 


Since f{z) ^ 0 and 


f"jiz)dz = 1 
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the integral defining tp{t) is always convergent and 

k(01 ^ 1. 

The distribution is completely determined by its characteristic func¬ 
tion. Because by the Fourier theorem 

at all points of continuity of f(x). But the left-hand member is 

by the definition of <p{i) and so 

m = i J_ jp{t)e-'>^dt. 

8. To illustrate the preceding general explanations we shall now con¬ 
sider a few examples. 

Example 1. Let x be a variable with uniform distribution of probability over 
the interval (0, 1). The density of this distribution being constant 


the mean value of z is 


and the second moment 


Hence, the square of the standard deviation 

(T* = m-i “ 

This simple example may be used to illustrate a remark made at the beginning of this 
chapter, that sometimes it is profitable to substitute for a variable with a finite but 
large number of values a fictitious continuous variable. Suppose that in flipping a coin 
n times, we mark heads by 1 and tails by 0, thus obtaining a sequence comprising n 
units and zeros altogether, disposed in the order of trials. This sequence may be con¬ 
sidered as successive digits in the binary representation of a fraction: 


/(^) = • 


f'dz I 

' ~ “ 2 

= r>= 

Jo I 3 


^ " 2 + 4 ^ 


+ — 

2» 


contained between 0 and 1. X may be considered as a stochastic variable with 2" 
values each having the probability 1 /2*. The probability 11 (or, fi) that X will be con¬ 
tained in the interval (a, 0), or more definitely that X will satisfy the inequalities 

a < X ^ 0 
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is obviously obtained by multiplying the number of integers N contained in the limits 

2^a <N ^ 2 ^^ 

by 1/2". Now there are exactly 

[2»/3l - [2"«1 = 2»(/3 - a) +$; -1 < < 1 

such integers; hence 

$ 

n(a, — a 4- —• 

If n is even moderately large, this probability is very near to the probability 

P(a, /S) » /S - a 

that a fictitious variable x with uniform distribution over the interval (0, 1) will 
assume a value in the interval (a, /3). The first two moments of the variable X are, 
respectively 


Ml 

Ms 


0 +1 +2 + 


2*» 

0* 4-1* + 2* H- • 


4 - 2 " 1 1 
2 2 "+» 

• 4 - ( 2 " - 1 )* 1 1 


2 *" 


3 2"+^ 3 • 2*"+i 


and differ little from the respective moments ^ and ^ of the fictitious continuous 
variable. Without losing anything essential, we here gain considerably in sim¬ 
plicity by substituting a fictitious continuous variable for the discontinuous variable 
X. 

Example 2. A thin bar can rotate freely about its middle point P. It is set in 
motion and after several revolutions comes to a stop pointing toward a point A on a 
w « line L The position of the bar is determined by an angle d 

\ formed by itself and the perpendicular PO dropped from Ponl;d 

l*^\ varies between the limits — ir/2 and ir/2 and its distribution is 

0 J ^ supposed to be uniform. The position of X is determined by 
Fig. 2. distance OX - x from O, this distance being positive or nega¬ 

tive according as A is to the right or to the left of the point O. 
It is required to find the distribution of the probability of x. The relation between d 
and X is 


if OP « a or, conversely, 


X » a tg 9 


' arc tg — 
a 


By differentiation we find the relation between de and dz: 

adx 


de 


o* 4-®* 


Now, by hypothesis, the probability that <OPX will be contained between 0 and 
0 +deia 

de 1 adx 
•w r o* 4- 
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And the probability that the distance of X from 0 will be contained between z and. 
a; + da; is the same. Hence, the density of probability for the variable x is 

and the probability corresponding to a finite interval (c, d) is given by 


1 adz 

P{c, d) = - I - - ‘ 


For the whole range of variation of x 


1 r* adz ^ 

IT J- «ea* -f 2* 


as it should be. However, we cannot speak of the mean value of x or of moment? ol 
higher order, since the integrals 


r* xdx C"* x*dx 

J_ «a* 4- a;* J- «a* 


have no meaning. But the characteristic function ^(f) exists and is given by 


a r * e‘**dx 

ir J_ 4- X* 


Example 3. One of the most important distributions (theoretically and prac¬ 
tically) is the so-called “Gaussian*’ or “normal” distribution. The density of this 
distribution is given by 

f{z) = 

with three parameters A, /i, a. However, only two of these parameters are inde¬ 
pendent, since we must have 


j }(z)dz = kJ = kJ 


h 


and finally 


fiz) - 

Vt 


To find the meaning of a and h we observe that the mean value of our variable is 




)*adz = f (2 - a)c-**^»-«>*d2 + f e-**<«“*>*dz * a 
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Thus a has the meaning of the mean value of the normally distributed variable x. 
The square of the standard deviation is given by 

<r» = —^ r e”**^*“*>*(« — a)*dz = \ c“**“*M*du = ^ 

whence 

.--V- 

a'V 2 


Thus for the normally distributed variable with the mean a and standard deviation a 
the density of probability is 

1 (g-q)* 

f{z) - y=e . 

o-V 2ir 

Finally, for the variable u — x — a with the mean value 0 and the same standard 
deviation, the expression of density takes the simplest form 


f(z) - 



j* 


and the distribution function of probability is represented by the integral 


The curve of density 


1 rt 

Fit) — - y= I e 

<r\/^J-. 00 


y 



2a» 


or the probability curve has a bell-shaped form as shown in the figure corresponding 



O 

Fio. 3. 

definition 


to <r == 1. It has a single maximum corre¬ 
sponding to X = 0 and on both sides of this 
maximum it rapidly approaches the x axis. 

The characteristic function of normal 
distribution has a very simple form. By 


But as 


j: 


1 f« -IL 

ifit) =-7= I e 

<r V 2 ir J - «, 



we find that 


vH* 

ifiit) = e ^ . 


The moments of normal distribution (with the mean ~ 0) can now be easily found. 
From the definition of the characteristic function it follows that 
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In our case 



Thus 


mtk+i — 0 

ma* = 1 • 3 • 5 • • • (2A; - 


Case of Two or More Variables 
9. By analogy it is easy now to extend the notion of probability to 
two or more variables considered simultaneously. A pair of special 
values Xj y of two stochastic variables X, Y will be represented geomet¬ 
rically by a point with the coordinates Xy y referred to a rectangular 
system of axes. The domain S of all the possible values of X and Y will 
be represented by a portion (finite or infinite) of a plane with a definite 
boundary unless this domain coincides with the whole plane. The 
probability that the point x, y should belong to an infinitesimal area 
dxdy will be expressed by the product ip{Xy y) dxdy where the function 
ip{Xy y) is again called the density of probability at the point Xy y. The 
density of probability must satisfy two requirements: it is non-negative 
in the whole domain S and 


f JAx, y)dxdy - 


where the double integral is extended over all the domain S. The 
probability for the point Xy y to be located in a given domain cr is then 
given by the integral 

j /»>(®. y)dxdy 

9 

extended over <r. 

If ip(xy y) is a constant in #S, the distribution of probability is called 
uniform. The domain S in this case must be finite and if its area is 
denoted by the same letter, then 

y) = g- 


The probability for the point x, y to be within the domain or will be given 
by the ratio 

<T 

S 


denoting th^ area of the domain a by a again. 
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10. We can always substitute the whole plane for the domain flf. 
To that end it suffices to set 


y) = 0 

in all points not belonging to S, We shall then have 

vix, y) ^ 0 


everjrwhere and 

y)dxdy = 1. 

By doing so we have the advantage of stating results in a perfectly general 
form without mentioning the domain S, However, in dealing with 
particular problems, it is more convenient to consider only those points 
which can actually represent simultaneous values of the variables. 
The probability of simultaneous inequalities 


a < X < h; c < y < d 


according to the general definition is represented by the double integral 

rr <pix, y)dxdy. 

This corresponds to the compound probability of two events and we must 
see that the fundamental theorem of compound probabilities continues 
to hold. Taking c = —<»,d = +<» the repeated integral 

\(x, y)dy 

represents the probability P(a, h) for the variable X (as if it were con¬ 
sidered alone without any reference to Y) to have its value in (a, b). 
The function 


f(x) = y)dy 

represents the density of probability of X. Thus 
P(a, h) = £f{x)dx. 

In a similar way 

P(y) - y)<^ 

represents the density of the probability of Y ; and the probability Q(e, d) 
that this variable has its value in (c, d) is given by 

Q(c, d) = /V(y)dy. 
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Now the double integral 

s:s: ip{x, y)dxdy 

can be written in either of the forms 


where 


y)dxdy = j^f{x)dx • ^^Fi{y)dy 
y)dxdy = jy{y)dy ■ £Ux)dx 


r<p(x, y)dx 

FM = -: 

fj(x}dx 


Mx) = 


y)dy 

£Fiy)dy 


may be considered as densities of conditional probabilities, respectively, 
for Y when it is known that X has a value in (a, b) and for X when it is 
known that Y has value in (c, d). The preceding expressions for the 
probability of the simultaneous inequalities 

a < X <h, c < y < d 

have the same form as the theorem of compound probability and may be 
considered as its extension. The conditional probability for Y to have 
its value in (c, d) when it is known that X has its value in (o, h) is given by 

£F^(,y)dy. 

Now, we define variables X and Y as independent when the proba¬ 
bility for Y to be in (c, d) is not affected by the knowledge that X belongs 
to (a, b), which means that 


£Fi(y)dy = £F{y)dy 
or 

£f,'‘v>(x, y)dxdy = ffF(y)dy ■ £j{x)dx 
and, since intervals (a, b) and (c, d) are arbitrary, 

y) = fi^) • F{y) 

at points of continuity. Hence, the density of probability for two 
independent variables is a product of a function of x alone by a function 
of y alone. Conversely, when this condition is satisfied the variables are 
independent. For independent variables the probability of the simul¬ 
taneous inequalities 

a < X < b 

c < y < d 
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has a simple expression 

£S{x)dx • ffF{y)dy 

which is the product of the probability for X to have its value in the 
interval (a, h) by the probability for Y to have its value in the interval 
(c, d), in perfect analogy with the compound probability of two inde¬ 
pendent events. 

Finally, the mathematical expectation of any function y) can be 
defined by 

y)) = y)dxdy 

provided the integral in the right member exists. 

11. It is hardly necessary to dwell at length upon the case of several 
stochastic variables. A system of particular values Xiy X 2 t . • . Xn of 
n stochastic variables Xi, X 2 , . . . Xn may be considered as a point in 
n-dimensional space. The density of probability is a non-negative func¬ 
tion <p{xi, Xi, , , , Xn) defined in the whole space and 8 atisf 3 dng the 
condition 


. . . Xn)dXidX2 • • • dXn = 1. 


The probability for a point representing Xi, X 2 , . . . Xn to be located 
in a given domain is given by the integral 


// 


X2j ... Xff^dxxdx2 • . • dXn 


extended over v. In the case of uniform distribution of probability, 
ip(xi, Xt, . . . Xn) is by definition a constant in a certain finite region 
of space and =0 outside of that region. If V is the volume of that 
region and v the volume of the domain <r, the ratio v/V gives the proba¬ 
bility that a point belongs to v. 

The probability of the simultaneous inequalities 


01 < Xi < 61; 02 < X2 < 62; . . . On < Xn < 6» 


is given by the integral 



X 2 , . . . Xn)dXidX2 . . . dXn 


which, by introduction of the conditional probabilities as in the case of 
two variables, can be put into the form of a product of n integrals in a 
manner perfectly analogous to the expression of the probability of a 
compound event with n components. Finally, the variables are inde- 
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pendent if the density ^(xi, 0 : 2 , .. . Xn) is a product of n functions 
depending only upon Xi, 3 : 2 , . . . Xn, respectively, and conversely. 

The expression 

Xi, ■ ■ • X,)] = ■ ■ ■ j“j'<pdxidxi • • • dz, 

serves to define the mathematical expectation of any function ^(xi, 

X2, . . . Xn) of Xi, X2, . . . Xn. 

12. Since in introducing the extended idea of probability we took 
care to preserve the fundamental theorems of the calculus of probability, 
we may be sure that other theorems derived from them will hold for 
continuous variables. In particular, theorems concerning mathematical 
expectation and the fundamental lemma in Chap. X, Sec. 1, hold for 
continuous variables. Upon this basis as we have seen was built the 
proof of the law of large numbers. Hence, this important theorem 
applies equally to continuous variables. 


Geometrical Problems 

13. A few geometrical problems will afford a good illustration of the 
foregoing general principles. 

Problem 1. A rectilinear segment is divided by a point C into 
two parts AC = a, CB = h. Points X and Y are , , , , , 

taken at random on AC and CB, respectively. What is x C y B 

the probability that AX^XY^BY can form a triangle? 

Solution. We must first agree upon the meaning of the expression 
*‘at random.” The idea suggested by this expression implies that the 
way of selecting points X and Y gives no preference to 
any point of AC and CJ5, respectively. Consequently, 
variables x = AX and y = BY may be assumed to have 
uniform distribution of probability. The domain of the 
point X, y is a rectangle OMPN with the sides OM = o, 
Q OAJ = b. In order that AX, XT, BY can form a triangle 
the following inequalities must be fulfilled: 


R wa: 

\ an> 

O s M Q OK 

Fia. 6. f.Lp 


x< (a-hb — X — y)-hy or x<o + b — x 

y< (a + b- x-y)+x or y < a + b — y 

a + b — X — y<x-hy. 


These inequalities are equivalent to 


X < 


(1 -f" b 

—s —f 


y < 


a -f b 
~ 2 ~' 


X + y > 


(I b 
“2”‘ 


To interpret them geometrically through P draw a line QPR making 
<RQO = 45°. From the mid-point of QR drop the perpendiculars 
VSy VW on OX, OY. Then the preceding inequalities limit the position 
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of the point Xf y to the shaded area SYW, whose part TSU is contained 
in the rectangle OMPN, Variables x and y are independent and have 
uniform distribution. Hence, the density of probability of the pair 
X, y is constant and the probability that the point x, y is in the triangle 
TSU will be 

Area TSU ^ ^ ^ Ih 
Area OMPN ab 2 a 

At the same time this is the probability for AX, XF, BY to form a 
triangle. 

Problem 2. On a line AB two points Xi, X 2 are taken at random. 
What is the probability that AXi, X 1 X 2 , X 2 B can form a triangle? 



Fig. 6. Fio. 7. 

Solution. Variables AXi = xi, AX 2 = X 2 are independent and have 
uniform distribution of probability. The domain of all possible positions 
of the point Xi, X 2 is a square with the side AB = 1. Positions of this 
point when AXi, X 1 X 2 , X 2 B form a triangle can be characterized as 
follows. First, if Xi precedes X 2 , we have 

X 2 — xi < xi + I — X 2 or X 2 — Xi < ^ 

xi < X 2 — Xi + I — X 2 or 

I 

I — X 2 < X 2 — xi + xi or ^2^2 

which means that Xi, X 2 belongs to the triangle OPNj the definition of 
which is evident if L, Af, iV, P arc mid-points of the sides of the square 
ABCD, Second, if X\ follows X 2 , we have 

I I I 

Xi X 2 "^2' 2*f ^ 2 

and these inequalities define the area OLM. Since the distribution of 
Xi, X 2 is uniform, the required probability is 

Area OLM + Area ONP ^ ^ 1. 

Area ABCD ll 4 

Problem 3. A chord is drawn at random in a given circle. What is 
the probability that it is greater than the side of the equilateral triangle 
inscribed in that circle? 
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Solution 1. The position of the chord drawn at random can be deter¬ 
mined by its distance from the center of the circle. This distance may 
vary between 0 and jK, the radius of the circle. The chord is greater 
than the side of the equilateral triangle inscribed in the circle if its dis¬ 
tance from the center is less than 34 jR. Hence, the required probability 


= = - 

It 2 


Solution 2. Through one end of the chord, draw a tangent AT. 
The angle ip varying from 0° to 180° determines the position of the 
chord. If it is greater than the side of the inscribed equilat¬ 
eral triangle, the angle ^ must lie between 60° and 120°. 
Hence the answer 
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p2 = 


120° - 60° 
180° 


1 

3 ‘ 


The fact that we obtain two different numbers for the same probability 
seems paradoxical, and the problem itself is known as ‘‘Bertrand’s 
paradox.” However, going attentively over both solutions, we discover 
that we are really dealing with tw'o different problems. In the first 
solution it was assumed that the distance of the chord from the center 
has uniform distribution, while in the second solution the distribution 
of the angle <p was taken as uniform. The second solution may be con¬ 
sidered reasonable if a thin bar or a needle can rotate freely about A 
and if, being set in motion, it determines the chord AB by its ultimate 
position. On the other hand, the first solution is acceptable if a circular 
disk is thrown upon a board ruled with parallel lines distant from one 
another by the diameter of the disk. The intersection of the disk with 
one of the lines determines a chord, and the probability that it is greater 
than the side of the inscribed equilateral triangle can reasonably be 
assumed to be 

A general remark applies to all problems of this kind. When a 
certain geometrical element, such as a point or a line, is supposed to be 
taken at random, it should be clearly indicated by what kind of 
mechanism this is to be done. Only then the hypothetically assumed 
distribution can be put to an experimental test and either confirmed 
(approximately) or rejected. 

14. Buffon’s Needle Problem. A board is ruled with equidistant 
parallel lines, the width of the strip between two consecutive lines being 
d. A needle so fine that it can be likened to a rectilinear segment of the 
length Z < d is thrown on the board. What is the probability that the 
needle will intersect one of the lines (naturally not more than one)? 

Solution. This is the oldest problem dealing with geometrical 
probabilities. It was mentioned by Buffon, the celebrated French 
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naturalist of the eighteenth century, in the Proceedings of the Paris 
Academy of Sciences (1733) and later reproduced with its solution in 
Buffon^s book '^Essai d^arithm^tique morale,” published in 1777. 

Let us determine the position of the needle by the distance OP = x of 
its middle point from the nearest line, and the acute angle tp between OP 
and the needle. Variables x and <p may be considered as independent. 
Furthermore, x and (p vary respectively between 0 and and 0 and 
ir/2. As a hypothesis we assume the distribution of probability for 


X 



Fig. 9. Fio. 10. 


X and <p as uniform. The domain of a;, ^ is a rectangle OABC with 
OA = ir/2, OC = d/2. Now, the needle intersects one of the lines if 

X <^C08 <p 

and then the point a;, (p lies in the shaded area below the curve 

I 

X = ^ cos 


Since the distribution of x, (p is uniform, the required probability will be 


But 


_ Area OAD 
^ ~ Area OABC' 


Area OAD = ^ f 
AJo 

Area OABC = 5-5 

A A 


COS ipdip = 


1 

2 


and consequently 



On pages 112-113 an account was given of experiments made by several 
authors in connection with Buffon's problem. They all show good agree¬ 
ment with the theory and indirectly confirm the hypothesis assumed in 
deriving the above expression for probability. 

16. Extension of Buffon’s Problem. A thin plate in the shape of a 
convex polygon, of dimensions so small that it cannot intersect two of 
the lines simultaneously, is thrown on a board ruled, as in Buffon’s needle 
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problem. What is the probability that the boundary of the plate will 
intersect one of the lines? 

Solution. Suppose that the polygonal boundary has five sides. 
Let these sides (and their lengths) be denoted by 


a, 7 , 5, €. 


Each of them is shorter than the distance d between two consecutive 
lines. On account of convexity, a line can intersect either none or two 
(and only two) sides. Accordingly, combining sides in pairs, we can 
distinguish 10 mutually exclusive cases and denote their probabilities by 

(a/3), (a 7 ), (a5), (ac), (fiS), (/3c), ( 76 ), ( 76 ), (5c). 

The required probability will be given by the sum 

p = (a/3) + (oty) + (a5) + (««) + (fiy) + + (/Sc) + (7 5) + 

-f (tO + (5c). 


On the other hand, the side a can be intersected by a line in four mutually 
exclusive ways; namely, together with /S or 7 , or 5, or c. Hence, if (a) is 
the probability of intersection 


and similarly 


whence 


But 


(«) 


2 a 

Td' 


(a) = (a/3) + (ay) + (a5) + (ac). 


(/3) = (^«) + iffy) “h (^5) + (/3c) 
( 7 ) = ( 7 a) + (t/3) + ( 75 ) + (?«) 
(5) = (5a) + (5/3) + ( 57 ) + (5c) 
(c) = (ca) + (c/3) + (cy) + (c5). 


(a) + (0) + (t) 4- (5) 4- (e) = 2p. 



( 7 ) 


h 

wd' 





2c 

ird' 


and consequently 

_ g 4-/3 4- 7 4- 5 + 6 ^ P 
^ rd ird 

where P is the perimeter of the polygonal boundary. Evidently this 
result is perfectly general. Since it does not depend upon the number of 
sides, by passage to the limit, it can be extended to the case of a plate 
bounded by any convex curve. 

16. Second Solution of Buffon’s Problem. Barbier has given another 
extremely ingenious solution of Buffon^s problem and of its extension. 
Let f(l) be an unknown probability that the needle will intersect a line. 
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Imagine that the needle is divided into two parts V and Evidently a 
line intersects the needle if, and only if, it intersects either the first or 
the second part. Hence, by the theorem of total probabilities 

/(o ==/(o+/(n, 


whence, as in Sec. 4 , we conclude 

/(O = Cl 


where C is a constant independent of h The whole question is how to 
determine this constant. Barbier’s ingenious idea was to let this 
problem depend on the solution of another one: A polygonal line (convex 
or not) is thrown upon the board; what is the mathematical expectation 
of the number of points of intersection? The perimeter of the polygonal 
line can be subdivided into n rectilinear parts ai, 02, . . . On all less than 
d. With these n parts we can associate n variables rri, 0:2, .. . Xn, such 
that 

iCi = 1 if one of the lines intersects o» 

Xi = 0 otherwise. 

The sum 


S ~ Xi + X2 + * * * + Xn 


evidently gives the total number of the points of intersection. Hence 

E(s) = E(x,) + E(x 2 ) + • • • + E{Xn) 

and, if is the probability of intersection of a* with one (and only one) 
line, 

E(xi) = Pi. 

But, according to the previous result. 

Pi = Ca*. 

Hence, we have a perfectly general formula 

E(s) = C{ai + <12 + * ’ * "h On) = CP 

where P is the perimeter of the polygonal line. The result holds for any 
curvilinear arc (closed or not) as can be seen by the method of limits. 
This formula applied to a circle with the diameter d gives 

C • ird = 2 


since such a circle has always exactly two points of intersection with 
the lines of the system. Thus we find that 
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and 


/(I) 


Td 


as obtained before. For a closed convex line of sufficiently small dimen¬ 
sions only two cases are possible: two intersections (probability p), or 
none (probability 1 — p), whence E(s) = 2p and 



or 



in agreement with the result obtained in Sec. 15. 

17. Laplace’s Problem. A board is covered with a set of congruent 
rectangles as shown in the figure, and a thin needle is 

thrown on the board. Supposing that the needle is shorter_ 

than the smaller sides of the rectangles, find the probability- 

that the needle will be entirely contained in one of the --- 

rectangles of the set. 

Solution. Let AB = a, AD = b he the sides of the rectangle which 
contains the middle point of the needle, the length of which is 


I (I < a,l < 6). 

Taking AB and AD for coordinate axes, the position of the needle is 
y determined by two coordinates x, y of its middle point 

and the angle <p formed by the needle with the x axis. 

D c consider x, y, v? as three independent variables 

_^ with uniform distribution of probability. The domain 

^ ^ filled up with all possible points x, <p ie 2 l 

parallelepipedon 


0<x<a; 0<y<6; ^ ^ 


and the distribution of probability throughout this domain is uniform. 
To characterize the domain of points representing positions of the 



middle point of the needle when it is located entirely within A BCD we 
consider the* sections of that domain by planes tp = constant and their 
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projections on the plane xy. These projections are represented by 
the shaded areas in Figs. 13 and 14 corresponding to positive and negative 
ipy respectively. 

In Fig. 13 

<PAB = <p; AP\\BF\\CR\\DO 

and AP --BE = BF ^CR = DG DH = iL 
Similarly, in the second figure 

<JAB = <p; AJ\\BQ\\CL\\DS 

and AJ = AK -= BQ = CL = CM = DS = JZ. 

The area of the rectangle PQRS corresponding to these two cases can be 
expressed as follows: 

Area PQRS = {a -- I cos <p)(h — I sin (p) = ab — Z(6 cos ^ + o sin + 

+ Z* sin (p cos (p, 

Area PQRS = (a — I cos (p)(Jb + I sin ip) = ab — IQ) cos ^ — o sin tp) — 

— P sin <p cos (p. 

Without distinguishing positive and negative values of we may write 

F{ip) = area PQRS = oh — 6Z cos ^ — Zolsin ip\ + JZ*|sin 2^|. 

The volume of the domain representing positions of the needle entirely 
within ABCD is: 


while 


V = = irab - !a>l - 2al + I* 

”2 


V ^wdb 


is the volume of the domain 

0<x<a, 0<y<b, 

Hence, the required probability is: 

_ , 2l(a + b)-P 

- 7 ^ - 

and the complementary probability for the needle to intersect the 
boundary of one of the rectangles is: 
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Buffon’s problem may be considered as a limiting case when a » oo 
and, indeed, by setting a == oo, we find that 



in conformity with the result in Sec. 14. 

These examples may suffice to give an idea of problems in geometric 
probabilities. Sylvester, Crofton, and others have enriched this field 
by extremely ingenious methods of evaluating, or rather of avoiding 
evaluations, of very complicated multiple integrals. However, from the 
standpoint of principles, these investigations, ingenious as they are, 
do not contribute much to the general theory of probability. 


Problems for Solution 

1. A point X is taken at random on a rectilinear segment AB = I whose middle 

point is O. What is the probability that AX, BX^ and AO can form a triangle? The 
distribution of AX = x is assumed to be iftiiform. Ans. 

2. Two {Mints Xi, Xj are taken at random on AB = 1. ^ 

Assuming uniform distribution of probability, what is the mathe- A -»—►- B 

matical expectation of any power n of the distance^ between Xj 
and X,? 

ri ri dxidxt 21^ 


o 

Fio. 15. 


Ans. 


P (n + l)(n + 2) 

3 . Three points Xi, Xj, Xs are taken at random on AB. What is the probability 
that Xi lies between Xi and Xj? 

Ans. }4f assuming uniform distribution of probability. 

4 . A rectilinear segment AB is divided into four equal parts 

AC = CO --OD == DB. 


Supposing that the distribution of probability is symmetric with respect to 0, let P 
be the probability that a point selected at random on AB will be between C and D. 

Also, let Q be the probability that the middle point between 
—Q [) Q two points selected at random will be between C and D. Prove 
1 -f- 

Fia. 16. that Q > —— • 

Hint: The middle point of a segment XiXj is surely between C and D if : (i) Xi 
and X 2 are in CO; or (ii) Xi and Xa are in OD; or (iii) Xi and Xi are on opposite sides 
of 0. 

6. Two points Xi, Xa are chosen at random in a circle of radius r. Assuming 
uniform distribution of probability, what is the mathematical expectation of their 
distance? Ans. Denoting the required mathematical expectation by Af, we have 

X 2ir /*2ir 

I F(r, e, e')dede^ 

where 

f (r, 9, 9') = + 9 '’ - eos (9 - 

Hence, varying r by dr 

^ dF ^ 2rdr^^ \/r* -f- p* — 2rp cos {0 — B*)pdf> 



258 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. XII 


and 


d(,TW) = Wr^*’jr' \/r* 4" P* “■ 2rp cos wpdpdta. 


By introduction of new polar coordinates the integral in the right member can be 
exhibited as 



Fig. 17. 


Thus 


whence 



X 


2r oofl w 

uHu 



cos’ iodta 



d{irr^M) = 


M 


128r 

45t 


6 . A board is covered with congruent rectangles as in Laplace's problem. A coin 
the diameter of which is less than the smaller side of the rectangles is thrown on the 
board. What is the probability that it will be partly in one rectangle and partly in 
another? Ans. a, b, r being respectively the sides of the rectangles and radius of the 
coin, the required probability is 


2 r(o + 6 ~ 2 r) 
ab 

7. Solve Buffon's problem when the needle is longer than the distance between 
two consecutive lines. Ans. The probability for the needle to intersect at least one 
line is 

p = - 3(1 - sm H- 

ira TC 

where ipn is determined by cos v>o = d/l. 

8 . A board is covered with congruent triangles whose sides are a, b, c. A needle 
whose length is less than the shortest altitude of any one of these triangles is thrown 
on the board. What is the probability that the needle will be contained entirely 
within one of the triangles? Ans. The required probability is 

(Ao* 4- B 6 * 4- Cc*)l* (4o 4- 46 + 4c - 3f)f 

^ 2irg* 2 tQ 

where A, B, C are angles opposite to sides a, 6 , c and Q is double the area of the triangle. 
For equilateral triangles 



9. On each of the circles Oi, Os, Os, . . . with respective radii ri, rs, rsi . . . 
points Ml, Ms, Mst ... are taken at random. Supposing that the series 

ri 4- rs 4- rs 4- • • • 

in divergent, while the series 

rf+rj-hr}4- • • • 
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is convergent, prove that the probability that the length of the vector 
OM OxM\ -+■ 0%Mt 4" OiMi 4" * • • 4" 0»Af« 

will be > /S tends to 0 as i? « no matter how large n is. 

Indicaiion of Solutum. Let xi, xt, . . . x*; j/i, y*, . . . y. be components of 
OMifOMtf . . . OAf» on two rectangular axes OJT, OK. Then 

EiZi) * J^(y<) - 0 


Eix}) 


^ ^(yj) = 




Fio. 18. 


By Tsheb 3 rsheff’s lemma (Chap. X, Sec. 1) the probabilities Q and Q' of the inequalities 


h, + ,. + •■• +,.l> 

are both less than 1 /(*. Now, if the length OM > R then either 

R 


1*1 4- ** 4“ 


+ Xn\>—J=^t. 

V2 


■4 

R lo 

lvi + v» + • ■ ■ + *'"1 > “ ‘-y/i’ 


V"2 

Hence, the probability P for the length of OM to be > R is less than Q 4“ Q'; 
that is, 

20 

P<Q+Q'< 


R* 


10. Prove that 


lim 
n— "0 


rr 

JO Jo JO *i 


4- *5 4- 


4-*; 


4- 4- • • • + Xn 


-dxidxj ' * ' dxn 


2 

3* 


Hint: Considering Xi, xj, . . . Xn as continuous stochastic variables with uniform 
distribution over the interval (0, 1) prove with the help of Tshebysheff's inequality 
that the probability of 


- - t < x\+xl-\- ’ ’ - 4- ^ 

3 Xi 4" *2 4" • • • 4* *n 3 


for any € > 0 tends to 1 as n «. 

References 

E. Czuber: “(jreometrische Wahrscheinlichkeiten und Mittelwerte," Leipzig, 1884. 

E. Czuber: “ Wahrscheinlichkeitsrechnung,*^ 1, pp. 75-109, Leipzig, 1908. 

H. Poincar6: “Calcul des probabilit^s,** pp. 118-152, Paris, 1912. 

W. Crofton: On the Theory of Local Probability Applied to Straight Lines Drawn at 
Random in a Plane, the Method Used Being Also Extended to the Proof of 
Certain New Theorems in Integral Calculus, Philos. Trans., vol. 158, 1868. 

W, Crofton: Probability, ‘^Encyclopaedia Britannica,” 9th cd. 



CHAPTER XIII 

THE GENERAL CONCEPT OF DISTRIBUTION 

1. In dealing with continuous stochastic variables we have introduced 
the important concept of the function of distribution. Denoting the 
density of probability by /(«), this function was defined by 

Fit) = f^Jiz)dz 

and it represents the probability of the inequality 

X < t 

For a variable with a finite number of values the function of distribu¬ 
tion can be defined as the sum 

Fit) = "X, Pi 

Xi <1 

where pi, pi, . . . p» are respective probabilities of all possible values 
Z\f X 2 , • • • Xn of the variable x. The notation Xi < < is intended to 
show that the summation is extended over all values of x less than i. 
Again, F(t) for any real t represents the probability of the inequality 

X < t. 

In this case F(t) is a discontinuous function, never decreasing and varying 
between F( — «) =»= 0 and F(+«>) = 1. Its discontinuities are located 
at the points Xi, X 2 | . . . Xn and are such that 

F{xi + 0 ) “ F{xi - 0 ) = Pi, 

denoting, in the customary way, 

F{xi + 0) = lim F{xi + €) 

F(Xi — 0) = lim F{xi — e) 

when €, through positive values, converges to 0. To represent F(t) 
graphically we note that 



Fit) = 0 

for 

t < Xi 


Fit) = Pi 

for 

Xi < t < X2 

Fit) 

= Pi + Pi 

for 

Xi <t <Xt 

Fit) = Pi + pt + 

■ • ■ + Pn 

for 

Xn < t. 
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As for the value of F(t) at the point t == it is F(x< — 0). Hence, 
the graph of F(t) consists of rectilinear segments as shown in the figure 
(for n = 4:;xi = -2; X 2 = 0; Xa = 1; Xa = 3; Pi = = pa = p 4 == K) 

and belongs to the so-called step lines. 

Thus, in case of a continuous variable the distribution function is 
given by an integral, and in case of a discontinuous variable, by a sum. 
In stating theorems equally true for continuous and discontinuous 
variables, it would be tedious always to distinguish these two cases. 
The question naturally arises whether it is possible to represent distribu¬ 
tion functions, moments, and similar quantities by using new symbols 
equally applicable to continuous and discontinuous variables. In a 
similar kind of investigation Stieltjes was confronted with the same 


-oo -2 0 t 3 +00 

Fio. 19. 

difficulties and he succeeded in overcoming them by introducing a new 
kind of integrals known as “Stieltjes’ integrals.” 

Stieltjes’ Integrals 

2. Let <p(x) be a never decreasing function defined in the interval 
a ^ X ^ b. For any particular value of the argument both the limits 
(for c converging to 0 through positive values) 

lim fp{xo + c) = (p{xo + 0) 
lim <p{xq — €) = <p{xo ~ 0) 

exist. Since evidently 

<p{xo - 0) ^ (p{xo) ^ ip{xo + 0), 

Xo is a point of continuity of <p{x) if 


<p(xo — 0) = ip(xo + 0). 

If, however, 

(p{xo - 0) < ip{xo + 0) 


^(x) is discontinuous at Xo, and the difference 


Wo = <p(xo + 0) - (p{xo - 0) 

gives the measure of discontinuity or simply discontinuity. Since 
for any number of points of discontinuity xo, Xi, . . . Xn the sum of 
discontinuities 

Wo + Wi + • • • + Wn ^ <p{h) - (p{a) 
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the points of discontinuity form a countable set. For there are only a 
finite number of discontinuities above any given number, so that, con¬ 
sidering the sequence 

5 > 5i > 52 > • • • 

tending to 0, there is only a finite number of points with discontinuities 
>5; also a finite number of points with discontinuities g5 and >fii, 
and so on. It follows that points of discontinuity can be arranged into 
a single sequence and hence form a countable set. 

It may happen, however, that ^(x) may have discontinuities in any 
interval, no matter how small; but at any rate there are points of con¬ 
tinuity in any interval. If ^(xo + «) > — «) for all sufficiently small 

€ > 0 the point Xo is called a point of increase'^ of ^(x). In particular, 
any point of discontinuity is a point of increase. 

3. Let /(x) be a continuous function in the interval o ^ x g 6. By 
inserting points Xi < Xa < . . . < x« this interval is subdivided into 
n + 1 partial intervals. In each of these we arbitrarily select points 
fo, fi, . . . in and form the sum 

S — f(io)[<p(xi) — + • • * + 

+ f(in)[ip{h) ~ 

It can be proved in the same way as for ordinary integrals that when 
all intervals 


xi - a, X 2 ~ xi, . . . 6 - x„ 

tend to zero uniformly, the sum S tends to a definite limit. This limit, 
called Stieltjes* integral, does not depend up)on the manner of subdividing 
the interval (a, 6) or upon the choice of points fo, fi, . . . in- It has 
a perfectly definite value as soon as /(x) and ^(x) (together with a, b) 
are given, and accordingly is denoted by 

jyix)d<p(x). 

In case ^(x) has a continuous derivative, d^(x) can be interpreted 
as the ordinary differential; Stieltjes’ integral then coincides with the 
ordinary one. In other cases dip(x) is a new symbol introduced as a 
reminder of the origin of Stieltjes' integral. In particular, if <p(x) is a 
step function with discontinuities pi, p 2 , ps, . . . at the points Xi, 
X 2 , Xj, . . . , Stieltjes^ integral coincides with the sum 

2p</(x<) 

which is a finite sum or an absolutely convergent infinite series according 
as the set of points of discontinuity is finite or infinite. 



SBC. 41 THE GENERAL CONCEPT OF DISTRIBUTION 263 

Stieltjes^ integrals possess many properties of ordinary integrals. 
For instance, the mean-value theorem holds for them in the form: 

fy(x)dv(x) =m)Mb) - v(.a)] 

where a ^ ^ 6. Also, if f(x) has a continuous derivative, we have an 

analogue for the integration by parts 

f'mdvix) =f(.bMb) -fiaMa) - f\{x)df(x) 

where df{x) means an ordinary differential and the integral in the right 
member is an ordinary integral. However, some important properties 
of ordinary integrals do not hold universally for Stieltjes^ integrals. For 
instance, considered as functions of b or a, they may have discontinuities. 

In the definition of Stieltjes' integral it was assumed that a and b 
were finite numbers. Stieltjes^ integral over the interval -■ «, + ® is 
defined in an ordinary way as being the limit of 

£f(x)d.pix) 

when a and b tend independently to — oo and + « , respectively. In 
other words, 

f_\f(x)dvix) = lim £f(x)dv{x) when a —>— 00 , b —►+«>, 

provided this limit exists. If it does not exist, the symbol 

f"j{x)dip{x) 

has no meaning. 

The General Concept op Distribution 
4. The most general type of distribution function of probability, 
covering all imaginable cases, is given by a never decreasing function 
F{t) defined for all real values of t and varying from F( — oo) = 0 to 
F(+oo) = 1. If at points of discontinuity we set 

F{t) = F{t - 0), 

then for any t the probability of the inequality 

X < t 

will be given by F(0. Also, the probability of the inequalities 

i\^x <ix 


will be 


F{U) - F(f,). 
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The case of continuous having a continuous derivative f{t) 
(save for a finite set of points of discontinuity), corresponds to a con¬ 
tinuous variable distributed with the density/(O, since 


If F(f) is a step function with a finite number of discontinuities, it charac¬ 
terizes the distribution of probability of a variable with a finite number 
of values. Finally, if F{t) is a step function with an infinite set of dis¬ 
continuities distributed without density, it corresponds to a variable 
whose values can be arranged in a sequence according to their magnitude. 
These are the most important types of variables considered in the 
calculus of probability, and for all of them the distribution function can 
be represented by Stieltjes^ integral 


F{t) = 

The mathematical expectation of any continuous function f{t) is 
defined by Stieltjes^ integral 


Eim) = 


provided it has a meaning. In particular, moments of the order n (n 
positive integer) and absolute moments of the order a (a real) are defined, 
respectively, by 


and we always have 
Finally, 


rrin = 

\mn\ ^ Mn. 


(p{t) = je*^*dF{x) 


is the characteristic function of distribution. Since the integral exists 
for any real this function is defined for all real values i and satisfies the 
inequality 

\m ^ 1 . 


Inequalities for Moments 

6 . Moments of any distribution satisfy certain inequalities, which 
it is important to know. They all are particular cases of the following 
very general inequality due to Liapounoff. 
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LiapounolPs Inequality. 

the inequalities 


Liet a, h, c he three real numbers satisfying 
a ^ 6 ^ e ^ 0 


and fia, Me absolute moments of orders a, h, c for an arbitrary distribu¬ 
tion. Then the following inequality holds: 

Proof, o. Let pi, ps, . . . Pn; xi, xj, . . . Xn be positive numbers 
and 


ip(a) = piX? + paX? + • • • + PnX“. 

Then for arbitrary real numbers Si, Sa, . . . Sp the following inequality 
holds: 


(1) ^ <PiSxWM . . . v,(s,). 

For p = 2 this inequality follows immediately from the known inequality 
due to Cauchy: 

( n \2 n n 

by taking in it 

at - -s/pic?, bi = 

For p = 4 we have 

+ + + ^ ^ 
and continuing in the same manner we find in general that 


Let m be taken so that 2*" > p and let us take in the last inequality 

Si + S2 + • • • + Sp 


Since 


SjH- 1 ®I»-f2 “ 


Si + Ss + 


= Sa- = s = 


+ Sa.* _ ps -I- (2” — p)s __ 


we shall have 


<p(8V* S (fi\8i)io(sii • • • ^(Sp)v>(s)*" ''j 
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whence 

ip(8)^ ^ (p{8i)ip(8i) • • • V>(«p)> 

which is inequality (1). 

6. Let o^6^c^0be integers. Taking p = a — c; Si = «» = 
• • • ~ ^a—^ ~ 8a— 1 ^. 1 — • • • = 8a—c ~ we have 

8i + 82 ^ + 8a-e ^ (o — b)c + (b - c)a ^ ^ 

a “ c a — c 

and consequently, by virtue of (1), 


/ n \ a—e / n \ a—b / n \ h—c 



\T / \T / 


If a == p/a, 6 == 9 /«, c = r/s are rational numbers (a ^ 6 ^ c ^ 0), 

1 

it suffices to take, in (2), p, r instead of a, 6, c, replace Xi by x% and 
raise both members to the power 1/s to ascertain that (2) holds for 
rational a, h, c. Finally, the passage to the limit makes it clear that (2) 
holds for real a, 6, c, provided a ^ 6 ^ c ^ 0. 

c. Let the interval A to 5 be subdivided into partial intervals by 
inserting numbers ti < (% < • • • <tn between A and B and let 

po - F{tO - F(A), Pi = F{t 2 ) - F(«0, . . . Pn = F(B) - F{Q 

Xt = |/4|, = |<i|, |i,|. 

Then the three sums 


%PiA, 

0 0 0 
will tend to the respective limits 

J/WWW, jyVdFit) 

when all differences A — ^i, <a — . . . -S tend to 0 uniformly. 

Hence, passing to the limit in (2), we get 

and finally, letting A tend to — w and J5 to + «>, 

or 

1^ ^ 
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as stated. 
Taking b = 


a + c 
2 ’ 


Liapounoff's inequality becomes 


o —c a — e 

2 

whence 

pl±e ^ 

2 

for any two real positive numbers a and c. If k and I are two positive 
integers and we take c = 2k, a = 21, then 


or 

since 


^ mikTnu 


\Tnk+i\ ^ Mt+i and ^ 2 * = W 2 k, M 2 J = Wji. 

Another^ important inequality results if we take c = 0. Then, since 
Mo = 1, 


M? ^mS 

I 1 
^ M“ 


if a > 6 > 0. This amounts to 

ISKif* g if a >b 

b a 

which is equivalent to the statement that 


log M» 

X 

is an increasing function of x for positive x. 


Composition of Distribution Functions 
6 . An important problem in the calculus of probability is to find the 
distribution function of the sum of several independent variables when 
distribution functions of these variables are known. It suffices to show 
how this problem can be solved for the sum of two independent variables. 

Let X and y be two independent variables with the corresponding 
distribution functions F(Jt) and G{t). To find the distribution function 
Hit) of their sum 


z = X + y 
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is the same as to find the probability of the inequality 

x + y <t 

for an arbitrary real number L Here, for the sake of simplicity and in 
view of the applications we propose to consider later, we shall assume that 
one, at least, of the variables x, y has continuous distribution with 
generally continuous density. 

At first, let both x and y have continuous distributions so that 

The probability of the inequality 

X + y <t 

according to the general principles stated in Chap. XII is expressed by 
the double integral 

mf) = y* ff(x)g(y)dxdy 
extended over the domain 


X + y < L 

Now, following ordinary rules, we can reduce this double integral to a 
repeated integral. To this end, for any fixed x we integrate g{y) between 
limits — 00 and t — x, thus obtaining 

= <?(< - x). 

Then, after multiplying by /(x), we integrate the resulting expression 
between limits — oo and + « for x. The final result will be 

H{t) = “ x)f{x)dx 

or, written as Stieltjes^ integral, 

H(f) = - x)dF{x). 

In the second place, let x be a discontinuous variable with different 
values xi, Xs, xs, . . . and corresponding probabilities pi, p^, ps, . . . . 
For X = x< the inequality 


x + y <t 


is equivalent to 


y <t - Xi 
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and the probability of this inequality is G{t — x*). Since the probability 
of x = Xi is p*, the compound probability of the two events 


X = Xi 
X + y <t 

will be 

p,G(t - Xi). 

The total probability H{t) of the inequality 


X + y <t 

will be expressed by the sum 

H{t) = ~ Xi) 

extended over all possible values of x. But this sum can again be written 
as Stieltjes^ integral: 


(1) H(t) = - x)dF{x). 

In both cases we obtain the same expression for H{t). Evidently 
Hif) can also be defined as the mathematical expectation of G{t — x ): 

H{t) = E\G{t - x)\ 

taken with respect to the variable x. The important formula (1) is 
known as the formula for composition of distribution functions F(f) 
and G{t). 

Example. Let x and y be two normally distributed variables with means * 0 
and respective standard deviations <ri and vi. Instead of using (1), it is better to 
write ^(0 as a double integral 


- 


1 

2ir<ri(rj 



x« 

ari* 


JfL 

^^'dxdy 


extended over the domain 


X -\-y <t. 


To evaluate this integral, it is natural to introduce x + y = s as a new variable and 
find constants C, D, a, so as to have identically 


^ = C(i + »)« + D(ca + Pv)\ 


whence one easily finds 


2(<r* + <r\) 


1 


2<r}ffJ((r} + <r}) 
P - -a* 


£1 j-ll 

2*; 2,5 “ 


1 

2(,5 + ,5) 


{ 


(x 4- y)* + 



and 



270 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. XIII 
The Jacobian of 


a « » + y, 

with respect to x, y being 

1 1 

Ox <rj 

H{t) can be presented as the double integral 

1 c r 

"«) “ „ V . , - I I « 

J 

with the domain of integration defined by a single inequality: 



<ri 


g? +<r; 

OxOi 


z <t. 


Hence, 


m) 


2t(»; 

H«) 


1 r* «» ^» »« 

J-- I « 2(«*+..*)dj e 

i + « J- • 


's/2, 


~r 




J * 

« -\/2T(cr* + 0^}). 


The expression obtained for H{t) leads to a remarkable conclusion: 
The sum of two normally distributed variables with means == 0 and 
standard deviations cri and is also a normally distributed variable with 
the mean = 0 and the standard deviation <r = Voi + If the means 
of X and y are ai and 02 , then evidently z will be normally distributed 
with the mean a = ai + Oi and the standard deviation <r = ^/<r\ + <rj. 
Repeated application of this result leads to the following important 
theorem: 

If Xi, X 2 , . . . Xn are normally distributed independent variables with 
means ai, a%, , . . an and standard deviations <ri, <r 2 , . . . ^n, then their sum 


z = Xi + Xi + • • • + Xn 


is again normally distributed with the mean a = Oi + 02 + • • • + o* 
and the standard deviation a = \/<r\ + ai+ • • • + crj. 

Finally, any linear function 

u = CiXi + CiX* + • • • + CnXn 


is normally distributed with the mean a = CiOi + ^202 + • • • + c,an 
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and the standard deviation <r = + <ia\ + • • • + cj<ri. In 

particular, the arithmetic mean 

+ Xa 4~ • • • + 
n 


of identical normally distributed variables with the mean a and the 
standard deviation a is normally distributed about the mean a and with 
the standard deviation trly/n. Hence, the conclusion may be drawn 
that the probability P of the inequality 


is given by 


Xi “h ajj + * * 

• + Xn 

n 




nx» 

‘‘•'dx = 


y/2r 



e ^di 


and rapidly approaches 1 as n increases. This is a more definite form 
of the law of large numbers applied to normally distributed (identical or 
equal) variables. 


Determination of Distribution When Its Characteristic Function 

Is Given 

7. One of the most important conclusions to be drawn from the 
preceding considerations is that the distribution function of probability 
is uniquely determined by the characteristic function. The known 
proofs of this fact are rather subtle, owing to the use of conditionally 
convergent integrals. However, such integrals can be avoided by resort¬ 
ing to an ingenious device due to Liapounoff. In the general case, the 
distribution function of a variable x has discontinuities. To avoid the 
bad effect of these discontinuities, Liapounoff introduces a continuous 
variable y that, with reasonable probability, can have values only in the 
vicinity of 0. It may be surmised, therefore, that the continuous 
distribution function of the sum x y will approximately represent that 
of X and, by disposing of a parameter involved in the distribution function 
of y, will tend to it as a limit. To make these explanations more definite, 
let y be a normally distributed variable whose distribution function is 



When h is small, the probabilities of any one of the inequalities 

y > 6, y < -€ 
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will be extremely small and even will tend to 0 when h tends to 0. Hence, 
the distribution function H{t) of the sum x + y \b likely to tend to 
F{t) as a limit when h tends to 0. 

To prove this in all rigor, we apply the composition formula (Sec 6) 
to our case. We obtain the following expression for H{t ); 

or, in more convenient form 




e 

e k»dz 


i-x 

k 


dF(x) J e'-^'du; 

and furthermore, integrating by parts, 

H(t) = f " F(x)dx. 

Av xj_ » 

The integral in the right member can be split into three parts 


irr 


e ( * ) F(x)dx + 


hy/i, 

Now, for positive T 


h‘\/rft+. 


e (* **) F(a:)dz+ 


+ _^J‘ 'e~(! F(x)dx. 


^ J V-J. < ^ 


Making use of this inequality, we find that 


Vvjt+t 




h's/r, 
and similarly 


F{x)dz < 


Jt+€ 




dx ■ 


v^, 


I 


“ 1 
e"^'du < 2 ^ ** 


If-, 




e V * / F{x)dx < 


so that 

H(t) = f e"^F(t + u)du + -V f (< - u)du + Se-fi; 

A VV Jo AVv Jo 


0 < < 1 . 


Given an arbitrary <r > 0, the number « can be taken so small that 

0 ^ + m) - Fit + 0) <<r 

0 g F(< - 0) - Fit -u) <<r 
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for 0 < M < «, whence 


\hy/TcJo 


e ^'F{t + u)du 


F(t 4- 0) r* 


Vi 


i2f 

Jo 




< (F 


< <r 


f'e rV-du 

AV^rJo Vir Jo 

ad 

H(<) = ~ Ce-^'du + 9'(2,r + e"**); |^'| < 1. 

Jo 


On the other hand, 


1 r* 1 1 r* Id" -- 

~ViL = 2 - = 2 - *•: 0 < ^" < 1' 


so that finally 


Hif) - 


F(Jt + 0) + Fit - 0) 


<2a + 2e~F\ 


and for all sufficiently small h (e being kept fixed) 

H(0 - ^M . ±3+E(^ - 0) | < 4,. 


that is, 


lim H{t) 

fc-»0 


_ F{t + 0) + F(t - 0) 


or, if < is a point of continuity, 

lim H{t) = F{t). 

h-*0 

Now we must find another analytical representation for H{t). To 
this end we consider the difference 


H{t) - H(0) = -^ f’ dF{x) I " e—d«, 
VirJ- • 




*3* 

h 


and, to represent in a convenient way the inner integral, we make use 
of the known integral 


I e-i’‘e-"“dv = e-“*. 
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3 / t ”” 3 ; 

Multiplying both sides by du and integrating between — ^ and “-j— 
we find 

= rt- I « * «**'*-:- dv 


A.C~ 

VrJ-i 


2-J-- 


tv 


and 


H(t) - H(0) = ^J ^dFix) J ' 


tan 1 - g-ir 

4 P<vx£_£_ 


tv 


dv. 


The next step is to reverse the order of integrations, an operation 
which can be easily justified in this case. The result will be: 

k»9»i 


m - mo) 


2tJ- 


tv 




e*^*dF(a;) 


or 

1 r* 1 _ o-ivt 

H{t) - H(0) = ij_ j * ^(vf—^dv 

since 

V>(v) = J e^’^dFix). 

Now, taking the limit of Hit) for h converging to 0, we have at any point 
of continuity of Fit) 

1 r* 1 ~ 

(2) Fit) = C + ^ lim j e ^ ipiv) - ^ - dv 

where the constant 

^_F(+0)+F(-0) 

C-2 


is determined by the condition F(—<») = 0. Thus, the distribution 
function is completely determined by (2) at all points of continuity when 
the characteristic function ipiv) is given. 


Example 1. Let us apply (2) to find the distribution corresponding to the 
characteristic function 


^(e) =* e 2 , 


Since in this case the integral whose limit we seek is uniformly convergent with 
respect to h, we find simply 


Fit) 


C + 


C + 


ij;.' 

a: 




dt> 


--rSEi!:*. 


V 
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On the other hand (Chap. VII, page 128 ), 




e ^'du. 


1 r® -iii 1 r* 

C -= I 6 2 **du -= I ( 

<rV2xJ_«, c^2wj-.m 


-iii 

6 20*du. 


Taking ( = — the condition F( — «) *= 0 gives 


and so finally 


I [*o _ 


— r 

i/^J- «o 


c 2»*du. 


Naturally, we find a normal distribution with the standard deviation a- (compare page 
270). 

Example 2. What is the distribution determined by the characteristic function 
^(») = o > 0? 

As in the preceding example we find that 

rr/.v ^ . 1 r* ^ 1 r* sin ft;. 

Fit) = C 4- — I « C 4- - I e-**-(ft;. 

2irJ-« i; xjo t; 

But 

(ifsinft;^ f ^ a 

-- I e”®*- do *= I 6"®* cos fveft; =*-» 

dtjo 0 Jo o* 4- 1* 

whence 

1 r _sin tv^ _ a r * dx _ ® ^ ^ 

rjo V X Jo a* 4- a?* x J- wO* 4- »* 2 

Thus 

1 of** d® 

F(<) = C - - 4- - I - 

2^xJ-»o«4-x* 

and the condition F( — oo) * 0 gives C =* so that finally 

o cftc 

Naturally we find the same distribution as that considered in Example 2, page 243. 
Sometimes it is called “Cauchy’s distribution” with the parameter a. 

Composition of Characteristic Functions 
8. Having n independent variables xi, Xj, . . . x* whose charac-* 
teristic functions are v>i(0» • • • ^»(0> the product 

^(t) = ^i(0^3(0 * • * 
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is the characteristic function of their sum 

s * *1 + »*+•••+ 

In fact, the characteristic function of b is by definition 

(p{t) == • • - c**-*). 

Since Xi, xi, . . . Xn are independent variables, the expectation of the 
product 

6**1* ‘ 6**«* • • • e**»* 

is equal to the product of the expectations of the factors, whence 

= ipx{t)ip2{t) • • • ^n(0- 

This simple theorem is of great importance since it determines the 
characteristic function of the sum of independent variables and indirectly 
its function of distribution. 

9. A few examples will illustrate the preceding remark. 

Example 1. Consider n independent normally distributed variables Xi, Xt, . . . Xn 
with means - 0 and standard deviations . . . <rn. Their characteristic func¬ 

tions are 

ekH* 

iPkit) = e 2 ; fc = 1, 2, . . . n 
and the characteristic function of their sum 

« == xi -f- Xa + • • • + X« 

will be 

ip{t) = e 2 

where 

<r* = <rj 4- <r5 + • • • 4“ crj. 

Hence s is a normally distributed variable with the mean 0 and the standard deviation 
<r = \/<rJ 4- vj 4- ' • • 4- vi 

as we found previously by a method involving a considerable amount of calculation. 

Example 2. Independent variables xi, x^, . . . Xn have Cauchy’s distributions 
with parameters ai, at, . . . an- Since the characteristic function of Xk is 

the characteristic function of the sum 


will be 
where 


« =« xi 4- Xf 4- • • • 4“ Xu 

^(t) = 

a ai 4” a* 4“ • • • + 


Hence, s again has Cauchy's distribution with the parameter Oi 4" oi 4" * * • + o*- 
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Example 8. Let xi, Xty . . . acn be independent variables with uniform distribu- 
tion of probability in the interval (0, 1). The characteristic function of any one of 
them ia 


ij; 




- 1 
iU 


Hence, the characteristic function of their sum 8 will be 

The distribution function of « ia given by 


>y 

m = C + ^ lim r 

2irA-oj-- \ tlv / w 

and, since the integral again is uniformly convergent, 

1 r * — A"! — 


The evaluation of this integral presents certain difficulties. To avoid them we 
notice that the integrand considered as a function of a 

complex variable i; is holomorphic everywhere. Hence, <-— Pea/axis 

we can substitute for the reotilinear path of integration 
the path T as shown in Fig. 20. 


o 

Fio. 20. 


Now it is easy to show that integrating over the path r we have 

m 


i 


0 if p > 0 
- dz = (T* 

if 
nl 


P ^ 0 


The integral 


Jr\ Hz / M 


being a linear combination of integrals of the type fig) with p ^ 0 reduces to 0. 
Similarly, 


or, in explicit form. 


0 




Referring to the above expression of F(<)» we find that 

m = c + " *)' 
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The constant C » 0 since F{i) and the sum in the right member both vanish for 
f » 0. The final expression of F{t) is, therefore: 


m 


1 

r/?V 

CO 

• »Lw 




The series in the right member is continued as long as arguments remain positive. 
Such is the probability that the sum 


+ ** "f* * * * + 


of n independent variables, uniformly distributed throughout the interval ( 0 , will 
be less than i. The above expression is due to Laplace, who, however, obtained it in 
quite a different manner. 


Problems for Solution 
1. Prove directly the inequality 


Mo-he ^ 
2 


for absolute moments. 

Hint: The quadratic form in X, m 





is definite or semidefinite. Show that the equality sign cannot hold if ip{x) has at 
least two points of increase a, such that <x:/3 is neither 0 nor ± 1 . 

2. Let X\y Xi, . . . Xn be n variables. Denoting the absolute moment of the order 
a for Xi by Ma \ and by m the quotient 


• • • + 4 :>. ^ 

040 +4.) + ... 


prove that 

if «' > « > 0 . 


1 1 




Hint: Use Liapounoff’s inequality. 

8. A variable is distributed over the interval (0, + 00 ) with a decreasing density of 
probability. Show that in this case moments Mt and Af 4 satisfy the inequality 


and that in general 

il y > /X > 0 . 

Indication of the Proof. 


Af| g fAf 4 (Gauss) 

1 1 

[(ji + DA/mP ^ [{y + DMpY 

Show first that the existence of the integral 


f“xi(x)dx 


in caee /(x) is a positive and decreasing function implies the existence of the limit 
lim a'^^f(a) * 0 : a —> 4 - 
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Hence, deduce that 


xd(p{x) * 1, x>^^^dip(x) * (/< + l)AfM, x’''*’^d^(x) ~ (i^ + 1)M^ 


where ^(x) - /(O) — /(x) and, finally, apply the inequality 

4 . Using the composition formula (1), page 269, prove Laplace’s formula on 
page 278 by mathematical induction. 

6. Prove that the distribution function of probability for a variable whose charac¬ 
teristic function ^(t) is given can be determined by the formula 


F(l) 


C 4- lim 

fc-o2r 



<p(v) 1 

1 -j-h*v» 



Hint: In carrying out Liapounoff’s idea, take an auxiliary variable with the dis¬ 
tribution 


G(y) 


Also make use of the integral 



if* e-*^dx 




Many definite integrals can be evaluated using the relation between characteristic 
and distribution functions, as the following example shows. 

6. Let X be distributed over (— «, + «) with the density The character¬ 

istic function being in this case 


we find 


whence 


v>(<) 


_ 1 _ 

1 +<* 


^’(0 



1 - ^ 

-dr 

w(l 4- r*) 



e“‘*'dx, 

CO 



1 4-V* 


-dv 


an integral due to Laplace. 

7. A variable is said to have Poisson’s distribution if it can have only integral 
values 0, 1, 2, . . . and the probability of x =* A; is 


o*c”® 

IfeT' 


the quantity a is called ^'parameter” of distribution. If n variables have Poisson’s 
distribution with parameters oi, as, . . . a», show that their sum has also Poisson’s 
distribution, the parameter of which is at 4* as 4* * * * 4 a». 
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8 . Prove the following result: 


-L r * ^ ^ A * Bin tv _ 1 


— (< + n)" — “(< + n — 2)* -f 


the series being continued as long as arguments remain positive. 

Hint: Consider the sum of n uniformly distributed variables in the interval 
(— 1, +1) and express its distribution function in two different ways. 

9. Establish the expression for the mathematical expectation of the absolute 
value of the sum of n uniformly distributed variables in the interval ( —+K)* 
Ans. 


E\xi + *1 + 


+ *•1 » 


2*4-6 


(2n -f 2)1 




2)*+i + 





the series being continued as long as the arguments remain positive. 

Hint: Apply Laplace's formula on page 278, conveniently modified, to express the 
expectation of -f ** 4- • * * + *• and that of |a?i -f + * * ' + *i»|. 

10. Show that under the same conditions as in Prob. 0 


, n r * /sin A" W ( — < cos ( 

«|x. +*. + • • +*.1 j - dt. 

Hint: Prove and use the following formula 

— 1 — iyfx 

- V — dx - -»M. 

-r X* 

11. Let Xi and Xt be two identical and normally distributed variables with the 
mean «> 0 and the standard deviation v. If x is defined as the greater of the values 
ixi|, |xa|, that is, 

X - max. (|xi|, |xa|) 
find the mean value of x as well as that of x*. Ans. 



19. Let 

X - min. (|xi|, |x*l, . . . |x»|) 

where Xi, Xt, . . . x» are identical normally distributed variables with the mean - 0 
and the standard deviation v. Find the mean value of x. Am, Setting for brevity 

cVrJo 

we have 

- J["U 



THE GENERAL CONCEPT OF DISTRIBUTION 


281 


In particular forn » 2 


For large n asymptotically 


E(x) - -^(V2 - 1). 
V» 


B(x) 


•y/7h 

~ n + 1 


18. A variable with the mean 0 and the standard deviation 1 is called a 
*'reduced variable/’ By changing the origin and the unit of measurement any 
variable can be made reduced. For, if x has the mean a and the standard deviation a 
the variable 


u 


X — o 


<r 


is reduced. The distribution function of the reduced variable u can be called the 
“reduced law of distribution.” 

As we have seen, variables Xi and Xi with normal distribution have the same 
reduced law of distribution, as does their sum. The question may be raised: Is the 
normal law of distribution a unique law possessing this property? (G. Pdlya.) 

SohUion. Let Xi, Xt be two variables for which the second moment of the distri¬ 
bution exists, so that we can speak of their means and standard deviations. Let xi 
have its mean at and its standard deviation <ri; likewise, let at and at be the mean and 
the standard deviation of Xj. Three reduced variables 


Xi - Oi 

Ui « -» 

CTl 


Ut 


Xt ~ Oa 


Ui 


Xi 4~ — Qi — 


have by hypothesis the same law of distribution. Hence, they have the same charac¬ 
teristic function ^(0 whence we can draw the conclusion that the characteristic 
functions of xi, Xt, xi -f Xt are, respectively, 

^i(f) * v»>(0 =* ^p»(0 * H-orJO. 


Since 




we must have for an arbitrary real t 

^iaii)fp(atf) * ip(y/ al -f 

or 

(1) ^{cd)lp{0t) - (p{t) 

where 


a «■ — . ,■■■?"-■= I p » — / , — ■ > 

\/v! + vj \/a\ + 

Since (1) holds for every real f, we shall have 

ipiod) *= ip{aH)<p{aPt); ip{pt) “ 

and 


a* + - 


ip{aPt)ip{pH) 


1. 


(2) ip{t) = 

Applying (1) again to each of these factors in the right member of (2), we find that 

( 8 ) ¥^(0 - ^(aHMa*pt)*^{apH)»ip(pH) 
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and proceeding in the same way, we arrive at the general formula 
(4) ip{t) =* 

where po, pi, . . . Pn are coefficients in the expansion 

(1 -f *)" *» Po + PlJ» -h • • * 4- pn**. 

The arguments 

Vo * ot"f, Vi . . . Vfi « 

tend uniformly to 0 since a < 1, /9 < 1. The quotient 

v(») - 


-—^ (1 ~ x)e*^*dx 

convergent intei 

»(t>) - 1 ^ f 

U- 


is represented by a uniformly convergent integral; hence 


tW(t) - -- 


or 

where 

At the same time 


where again 


f>(v) * 1 + [-1 + «(»)!»• 
e(v) —►0 as V -*0. 

log ^(v) = [—i 4- 4(v)l»* (principal branch of log) 


a(v) -♦0 as V 0. 

Now, taking logarithms of both members of (4) 

log ip{t) « — i<*(poa** 4“ pia***”*/5* Pn0*^) 4" 0 *= — if* 4* H 

where 

0 = f*[po5(vo)a*" 4" pi5(vi)a**”*/3* 4" * 

Given c > 0, we can take n so large that 

|a(e<)l <«: f = 0,1,. , 

|n| < .<» 




whence 

Thus 


llog ^(0 4- if*l < «f* 

and since e can be taken arbitrarily small, 

log ^(0 4- if* « 0 


v»(f) 




which shows that the normal law is the only one with the required properties, among 
all laws with finite second moments. 
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CHAPTER XIV 

FUNDAMENTAL LIMIT THEOREMS 


1. Bernoulli’s theorem, as we have seen in Chap. VII, follows from a 
more general one known as Laplace’s limit theorem. In terms already 
familiar to us, this theorem can be stated as follows: Let an event E 
occur m times in a series of n independent trials with constant probability 
p. As n becomes infinite, the distribution function of the quotient 


m — np 
y/npq 

approaches 



as a limit; or, to state it in a less precise form, the distribution of the 
above quotient tends to normal. 

Just as Bernoulli’s theorem itself is a very particular case of the general 
law of large numbers, so Laplace’s limit theorem is a special case of 
another extremely general theorem, the discovery of which by Laplace 
may be considered as the crowning achievement of his persistent efforts, 
extending over a period of more than twenty years, to find the approxi¬ 
mate distribution of probability for sums consisting of a great many 
independent components with almost arbitrary distributions. The 
result at which Laplace finally arrived is as astonishing as it is simple: 
if Xi, X 2 , . . . Xn {E{Xi) = 0, i = 1, 2, . . . n) are independent variables 
(subject to some very mild limitations not stated, however, by Laplace) 
and Bn is the dispersion of their sum, then for large n the distribution of 
the quotient 

X\ X2 * • • + 

VK 


is nearly normal. To put it more precisely, the distribution function 
of this quotient tends to the limit 





as n becomes infinite. 

Laplace’a attempt to prove this important proposition does not stand 
the test of modern rigor and, besides, cannot easily be made rigorous. 
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The same is true of the attempts made by later investigators, notably 
Poisson, Cauchy, and many others. Only after a lapse of many years 
were truly rigorous proofs of Laplace’s theorem given. This important 
achievement is the result of the work of three great Russian mathemati¬ 
cians: Tshebysheff (1887), Markoff (1898), and Liapounoff (1900-1901). 
An account of Tshebysheff’s and Markoff’s ingenious investigations is 
given in Appendix II. Here we shall follow Liapounoff; for his method 
of proof has the advantage of simplicity even compared with more recent 
proofs, of which that given by J. W. Lindeberg deserves special mention.^ 

2. Before going into details of analysis, we shall state the limit theo¬ 
rem in a very general form due to Liapounoff. 

Laplace-Liapounoff’s Theorem. Let x\, Xtt Xn be independent 
variables with their means = 0, possessing absoltUe moments of the order 
2 + 5 {where 5 is some number > 0): 




Mi?.. 


//, denoting by Bn the dispersion of the sum xi + + 

quotient 


+ Xn, the 


+ y • +M g. 


A'- 






tends to 0 as n—* the probability of the inequality 


xi + xi + • • • 4- ^ 4 

VR ^ 

tends uniformly to the limit 

It is natural that the complete proof of a theorem of such character 
cannot be too short, and to make the proof clearer it is advisable to 
divide it into logically separated parts. 

3. The Fundamental Lemma. Let Sn be a variable, depending on an 
integer n, with the mean = 0 and the standard deviation = 1. If its 
characteristic function 

q>n{v) = j&(e<**») 

tends to 

»• 


> lindeberg’s proof, as well as later proofs by P. Levy and others, make use of an 
ingenious artifice due to Liapounoff. Lindeberg explicitly acknowledges his indebted’ 
ness to Liapounoff, while Levy and other French writers fail to give due credit to the 
great Russian mathematician. 
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uniformly in any given finite interval { — I, Z), then the distribution Junction 
Pn{t) of 8n tends uniformly (in the domain of all real values of t) to the limit 




J e-^^'du. 


Proof, a. Together with the variable «n, whose distribution function 
is Fn(t), Liapounoff considers another variable 

Tn = Sn + y 

where y is a normally distributed variable with the distribution function 


G(y) = 


1 r*' 


hy/i 


r. 


e ^'dx. 


Denoting the distribution function of by Unit), we have (Chap. XIII, 
Sec. 7) 

t-s 

■I • i 

( 1 ) 


dFn(x)j^ e-^'du, 

i inequality 

-i= I e-’du ^ L-'’; T^O 
VrJT 2 


On account of the inequality 


we have: 


For 


t-x <0: ^ f * e-'du = **) ; 0 < 9' g 1. 


V^J- 


For < — a: ^ 0: —^ = 1-^ j e-'‘'du — 1 — ( * ) ; 


0 < ^ 1 . 


Hence, introducing these expressions into (1), 

//.(<) = 

where again 0 < < 1; 0 < < 1. This leads to the following 

inequality: 


But 


- F.(t)l < ij’ e (‘■»*)dF,(x). 

e-( 4 -T= 
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and consequently 


mt) - FM < 


= I e * e-''‘ 




(2) \Hn{t) - Fn(t)\ < ^IJ e * + 

+ 1 e ^ 

Here we split the first integral into three Ji, J 2 , t/ 3 , taken respectively 
between limits — 00 , — Z; —Z, Z; Z, +00 and denote the second integral 
_?! 

by Ji. Since |^«(v) — e 2| ^ 2, we shall have 


,T . ,. ^ * r* ^ 2 e * 

X S' 


because 


for positive x. Also 




-L * 

e Zrft; = —7= 


To estimate J 2 we shall denote by €n(Z) the maximum of l^n(w) — e ^ [ in 
the interval —l^v^L Then 

Finally, taking into account (2), (3), (4), and (5), we find 

(«)« 

(6) |H.«) - F.(OI < +^ + ^‘-ir- 

h. Expression (1) of Hn{t) can be transformed in a manner similar 
to that employed in Chap. XIII, Sec. 7, if we first write 

} ■ r * = 1 j—L_ r * 
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Thus we get 




or 




+ 

V 


- A -•*» ft 




Now 


Tjo V Tjo 


« ,1 . 


< 


X' 


’ -- 

ve ^dv = 

4ir 


since 


0<l-e-"‘<^’ 

4 


and consequently 


(7) 


ff.(0 


2 Tjo V I 4 t 2 tJ»« \v\ 

To find an upper bound of the integral in the right member, we split 
it into five integrals /i, /i, It, / 4 , I& taken respectively between limits 
— —J; —i, —X; —X, X; X, 1; I, +». To estimate /a, we notice 

that 

k,(«') - i| ^ = ? 



Ie-^ - 1| ^ 

and 

»• 


? 

1 

1 

IIA 

Hence 


(8) 


t* 

To estimate U + /i, we use the inequality |v>»(t>) — e 

get 


(9) 

2x‘ T Jx V y/ih\ 


Finally, dealing with h and h, we use the obvious inequality 


WM -e »| g 2 
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and we obtain 


-|/i + /. 


- "f 


-^dv ^ 4e * 

® V ^ T (uy 


Taking into account (7), (8), (9), and (10), the following inequality 
results: 


- 5 - ;J. 


-s-sm tv 

o 2 - 1 


4t 2t ^ ^ *• (W)* 


In it, since X is still at our disposal, we can take 

X = €n(Z)*A-*. 

The inequality thus obtained when combined with (6) gives (a == hi) 

-£.* _£L* 

El /i\ 1 if* I ^ 4c * , 2c*, a , 

(II) )».(,)- j + 

Here a and I are arbitrary positive numbers. We dispose of them in 
the following manner: Given an arbitrary positive number c, we take a 
so large as to have 

—5? —— 

4c * ^ 2 c * ^ 1 

T o* a 3* 

and after that we select I large enough to make 

a , ^ 1 

V8Z 3^* 

Finally, since for a fixed Z, €«(/) by hyoothesis, tends to 0 when n oo, 
there exists a number no such that 


(s + + 5-® < S* 

for all n > n©. The inequality (11) then shows that 
L 1 if" ~ sin tv. 


for n > no and this means that 


r p tt\ 1 j- 1 r" sin toj 1 f* - 5 »*j 
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uniformly in t because the number no, as clearly follows from the pre¬ 
ceding analysis, depends upon c only and not upon i. 

Remark 1. Without changing anything in the proof, we can state 
the fundamental lemma in a slightly generalized form as follows: If U 
tends to the limit t, the probability of the inequality 

8n < tn 

tends to 

— r c-j-’dM. 

V^J-. 

Remark 2. The fundamental lemma, although not explicitly stated 
by Liapounoff, is implicitly contained in his proof. More general 
propositions of the same nature have been published by P6lya and L^vy. 
The very elegant result due to the latter can be stated as follows: If 
the characteristic function of the variable tends to the characteristic function 

4 ,(t) = J“y>-dF{x) 

of a fixed distribution uniformly in any finite intervalf then 

lim Fn{t) = Fit) 

at any point of continuity of Fit). 

The above proof, corresponding to the particular case 

can be used, almost without any changes, in proving the general proposi¬ 
tion of L4vy. 

4. Proof of LiapounofPs Theorem, a. If Liapounoff’s condition 

+' - + . Q 

is satisfied for a certain 5 > 0, it will be satisfied for all smaller 6. 

Let fiit) be the distribution function of x<(t = 1, 2, . . . n). The 
sum 

fit) =/l(0+/.(0+ • • • +/n(0 

being a nondecreasing function of f, the following inequality holds 
(Chap. XIII, Sec. 5): 
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provided a > b > c > 0. We take here 

o = 2 + «, 6 = 2 + 5', 

supposing 0 < 5' < 5. Then 


c = 2 


1 1 
nd 


But this inequality is equivalent to 










and it shows that 


if 


Xmis.' 


X'^+* 

_ 


0 , 


provided 0 < 6' < 5. Hence, in the proof we can assume that the funda¬ 
mental condition is satisfied for some positive 5^1. 

6. Liapounoff's inequality (Chap. XIII, Sec. 5) with c = 0, 6 = 2, 
a = 2 + 5 when applied to Xi gives 

6H« ^ 6, = E{x}). 

Hence, 

<“> 

and, since it is assumed that (an 0, all the quotients 

hi 


^ ^ _ 

Bn 6i + 6j + • * • + 6n 

will converge to 0 uniformly as n . 


(t = 1,2, 


n) 
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c. The following formula can easily be obtained by means of integra¬ 
tion by parts: 


= 1 + tx 




- 1)(1 - 0 *. 


If X is real and in absolute value >2, we have 




^2 - 1)(1 - i)dt\ 


^ x* < 


2« 


since 


1 | ^ 2 . 


If \x\ ^ 2, we can use the inequality 


le**' ~ 1| ^ 2 




and find 


- 3 2* 2* 


kJo ~ 

Thus, for every real x 

ei. = 1 + tx - I + |9| g 1. 

Substituting here 

I Sf 

* ■ ~ 

and taking the mathematical expectation of both members, we have 
(13) «.*«) = = 1 - |fl,| ^ 1. 




Furthermore, since 


1 - a; = 6-* - x > 0; 0 < 0 < 1, 


we can write 
(14) 

If WnW*"’’* < 1, we shall have, by virtue of (12), 


1 _ At* = 

2bJ 2\2bJ 
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and consequently 





This inequality, together with (13) and (14), leads to the following 
expression of ipk(t ); 

(15) Mt) = + <r*) 

where 


(16) 


(*) 


9 

8^ i+- 

Bn ^ 


|<r*l < < 3 




B*+l 


d. The characteristic function of the variable 
*1 + a:j + • • • + *. 


VK 


18 


^(0 = <Pl(f)<fi2(t) • • • (Pn(i) 

because Xi, Xi, . . . Xn are independent variables. Hence, by (15) 
ip(t) = C""l^’(l + <ri)(l + 0’2) • * * (1 + (Tn) 

W)-<J-*1<(l + k,|)(l + W) • • • (l-hkn|)-l<eM+M+*-+i-l-l 


and 

(17) W(t) - - 1 

taking into account inequalities (16). Inequality (17) holds if 

< 1 . 

Suppose, now, that t is confined to an arbitrary finite interval 

-I 


Because by hypothesis, tends to 0, the difference 

- 1 


will tend to 0 as n —♦ oo. In connection with (17) this shows that 

^(0 

uniformly in any finite interval. It suflBces now to invoke the funda¬ 
mental lemma to complete the proof of Liapounoif’s theorem. 

6. Particiilar Cases. This theorem is extremely general and it is 
hardly possible to find cases of any practical importance to which it 
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could not be applied. Two particularly significant cases deserve special 
mention. 

First Case. Let us suppose that variables Xi, Xi, . . . Xn are bounded, 
so that any possible value of any one of them is absolutely less than a 
constant C. Evidently 

nfl, ^ C^Eixf) = 

and hence 

C* 

It suffices to assume that 

= hi + ^2 + • • + 6n 

tends to infinity to be sure that u)n —> 0. Hence, dealing with bounded 
independent variables, the condition for the validity of the limit theorem 
is 

> oo as n —> 00 , 

which is equivalent to the statement that the series 


6i + 62 + 63 + • • • 


is divergent. 

Poisson's series of trials affords a good illustration of this case. In 
the usual way, we attach to each of the trials a variable which assumes 
two values, 1 and 0, according as an event E occurs or fails in that trial. 
Let Pi and = 1 — p* be the respective probabilities of the occurrence 
and failure of E in the tth trial. The variable Zi attached to this trial 
is defined by 

Zi = l if E occurs, 

Zi = 0 if E fails. 


Noticing that 


E(Zi) = Piy 


we introduce new variables 


Xi = - Pi {i = 1, 2, . . . n) 

with the mean 0, whose sum is given by 


m — np 

where m is the number of occurrences of E m n trials and p the mean 
probability 

Pi + p» + * * * + Pfi 
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In our case 
and 


= pm 

n 

Bn = 


Hence, we can formulate the following theorem: 
Theorem. The probability of the inequality 

m — np < t\^Bn 

tends uniformly to the limit 

1 n 

— 7 = I e 2 du 

03 n —^ 00 , provided the aeries 



is divergerUn At the same time the probability of the inequalities 
hy/Wn < m — np < hy/Wn 
tends uniformly (in fi, ( 2 ) to the limit 


1 n* _!L’ 

-=r I e 2du, 


Second Case. Let 21 , 22 , . . . 2n be identical variables with the 
common mean a and dispersion b. Supposing that for some positive 5 


E\zi — a|*+* = c 


exists, we have 


Wn = 


nc 




_» 
n 2 , 


and hence «n —^ 0 as n —♦ 00 . The limit theorem applied to this case 
can be stated as follows: 

The probability of the inequality 

2i + 22 + • • * + 2f* — na < f\/n6 

tends uniformly to 

1 n 
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provided 


E\zi - a|2+« 


exists for some positive d. As a corollary we have: The probability of the 
inequalities 



This proposition is regarded as justification of the ordinary procedure 
of taking a mean of several observed measurements of the same quantity, 
made under the same conditions, to approximate its “true value/' 
Barring systematical errors which should be eliminated by a careful 
study of the tools used for measurements, the true value of the unknown 
quantity is regarded as coinciding with the expectation of a set of poten¬ 
tially possible values each having a certain, probability of materializing 
in actual measurement. Since for comparatively small t the above 
integral comes very near to 1 and 


4 

for large n becomes as small as we please, the probability of the mean of a 
very large number of observations deviating very little from the true 
value of the quantity to be measured, will be close to 1 and herein lies 
the justification of the rule of mean mentioned above. 


Estimation of the Error Term 
6. The limit theorem is a proposition of an essentially as 3 rmptotic 
character. It states merely that the distribution function Fn(0 of the 
variable 


8n 


approaches the limit 


3?l + • • • Xn 



e ^du 


as n becomes infinite when a certain condition is fulfilled. For practical 
purposes it is very important to estimate the error committed by replac¬ 
ing FJf) by its limit when n is a finite but very large number. In his 
original paper Liapounoff had this important problem in his mind and 
for that reason entered into more detailed elaboration of various parts 
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of his proof than was strictly necessary to establish an asymptotic 
theorem. 

We do not intend to reproduce here this part of Liapounoff’s investiga¬ 
tion; it suffices to indicate the final result. Assuming the existence of 
absolute moments of the third order E\xi\^; i = 1, 2, . . . n, we shall 
suppose n so large that 

_ Mi" + M?» + • • • + Mi-" ^ 1 

20 ' 

Then, setting 

we shall have 

|fi| < |.„[(log gLy + 1.1 j + log 

Although this limit for the error term is probably too high, it seems 
to be the best available. However, it is greatly desirable to have a more 
genuine estimation of R, 

7. Hypothesis of Elementary Errors. It is considered as an experi¬ 
mental fact that accidental errors of observations (or measurements) 
follow closely the law of normal distribution. In the sphere of biology, 
similar phenomena have been observed as to the size of the bodies and 
various organs of living organisms. What can be suggested as an 
explanation of these observed facts? In regard to errors of observations, 
Laplace proposed a hypothesis which may sound plausible. He considers 
the total error as a sum of numerous very small elementary errors due 
to independent causes. 

It can hardly be doubted that various independent or nearly inde¬ 
pendent causes contribute to the total error. In astronomical observa¬ 
tions, for instance, slight changes in the temperature, irregular currents 
of air, vibrations of buildings, and even the state of the organs of percep¬ 
tion of an observer may be considered as but a small part of such causes. 
One can easily understand that the growth of the organs of living organ¬ 
isms is also dependent on many factors of accidental character which 
independently tend to increase or decrease the size of the organs. If, 
on the ground of such evidence, we accept Laplace^s hypothesis, we can 
try the explanation of the normal law of distribution on the basis of the 
general theorems established above. 

Suppose that elementary errors do not exceed in absolute value a 
certain number f, very small compared with the standard deviation tr 
of their sum. The quantity denoted by Wn in the preceding section will 
be less than the ratio l/a and hence will be a small number; and the same 
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will be true of the error term R. Hence, the distribution of the total 
error will be nearly normal. 

Laplace’s explanation of the observed prevalence of normal distribu¬ 
tions may be accepted as plausible, at least. But the question may be 
raised whether elementary errors are small enough and numerous enough 
to make the difference between the true distribution function of the total 
error and that of a normal distribution small. Besides, Laplace’s 
hypothesis is based on the principle of superposition of small effects and 
thus introduces another assumption of an arbitrary character. 

Finally, the experimental data quoted in support of the normal dis¬ 
tribution of errors of observations and biological measurements are not 
numerous enough for one to place full confidence in them. Hence, the 
widely accepted statistical theories based on the normal law of distribu¬ 
tion cannot be fully relied on and may be considered merely as substitutes 
for more accurate knowledge which we do not yet possess in dealing with 
problems of vital importance in the sphere of human activities. 

Limit Theorems for Dependent Variables 

8 . The fundamental limit theorem can be extended to sums of depend¬ 
ent variables as, under special assumptions, was shown first by Markoff 
and later by S. Bernstein, whose work may be considered an outstanding 
recent contribution to the theory of probability. However, the condi¬ 
tions for the validity of the theorems established by Bernstein are rather 
complicated, and the whole subject seems to lack ultimate simplicity. 
For that reason we confine ourselves here to a few special cases. 

Example 1. IjCt us consider a simple chain in which probabilities for an event E 
to occur in any trial are p' and p", respectively, according as E occurred or failed in 
the preceding trial. The probability for E to occur at the nth trial w^hen the results of 
other trials are unknown is 

Pn = P + (pi - P)5""l 

where vi is the initial probability, 5 = p' — p" and 


P 


1 - 5 * 


The mean probability for n trials is given by 


Pn 


Pi - p 1 - 5" 

p —j 


HO that p may be considered as the mean probability in infinitely many trials. 

In the usual way, to trials 1, 2, 3, ... we attach variables xi, xj, a-j, . . . so that 
in general 


= -p. 


x< = 1 — Pi 


or 


Xi 
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according as E occurs or fails in the tih trial. If m is the number of occurrences of 
X in n trials, the sum 

*i + *s + • • * + *» 


of dependent variables represents 


Evidently 


m — np». 
E(tn - npn) - 0 


and, as we have seen in Chap. XI, Sec. 7, 


Bn « E(m - nPn)* ' 


that is, the ratio of B»: npq 


1 4-a 
1 - « 


tends to 1 as n becomes infinite. 


In order to find an appropriate expression of the characteristic function of the 
quotient 


tn — nPn 

Vb. 


we shall endeavor first to find the generating function cin(t) for probabilities 

= 0, 1, 2, ... n) 

to have exactly m occurrences of B in n trials. Let Am,n be the probability of m 
occurrences when the whole series ends with E and similarly Bm.i* the probability of 
m occurrences when this series ends with F, the event opposite to E. The following 
relations follow immediately from the definition of a chain 


-diB.n+i — Am—l.nP^ "f* Bm—l,nP*^ 

= An^.nq' + Bn^.nq''. 

Let 

• « 

*,«) - Ml) = 5 ) 

m—0 m—0 

be the generating function of Am.n and B^.n. From relations (18) it follows that 


«n+l(0 * P't9n{t) 4- p"tMt) 


These relations established for n ^ 1 will hold even for n 
Mt) by 

p'Oo + = Pi 

g'So + g'Vo « 1 - Pi 

whence 


^0 4* ^0 *“ 1. 


0 if we define So(0 and 


From (10) one can easily conclude that both 0n{t) and 4'n(t) satisfy the same equa¬ 
tion in finite differences of the second order 


^•+1 - (p'< 4- qn&n^l 4- mn » 0 
— (p'f 4" q'14'n+l 4- *■ 
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Evidently 


satisfies the equation 


Pm.n “ Am,n “H Em.nt 

«n(<) - «n(0 + ^n{t) 


(20) ««+i — {p*i 4- q”)tan+i + ^to»n ** 0 

and is completely determined by it and the initial conditions 

6)0 = 1» Wl = 4“ Pit. 

Since 

p' =* p 4- 9" = g 4- p« 

the characteristic equation corresponding to (20) can be written 

(f - I)(r - a) = (i ~ I)[(p -h ?a)r - a] 

and for small ( — 1 its roots can be expanded into power series 

Ti = 1 4- ci(t — 1) 4- Ci(t — !)• 4" * * • 

f, = a 4- di(« - 1) 4- dtXt - D* 4 - . . . . 

The general expression of 6>«(0 will be 

6)n(o - Ari 4- Brs - 4- Ba'^rr* 


where to satisfy the initial conditions we must take 


ta - gi - Pit 
fi - ri ' 


-“Ti 4- gi 4~ pi< ^ 

r* - ri 


Having found a)n(0t the characteristic function of 


m - nPn 


will be g^ven by 


-nfr.-— 

^n(v) = e 




To study the asymptotic behavior of v»n(») when v is confined to a finite fixed 
interval —I ^ v ^ i, we notice that then 


will be well within the convergence region of the series we are going to consider now. 
By means of Lagrange’s series or otherwise, we find the following expansion of log h in 
power series of £ — 1 

log fl = p« - 1) - - 1)* + • • • 

convergent for sufficiently small values of £ — 1. By setting £ = c*** we obtain another 
power series in u 

pal + S 

log f I = ptu - — 4- • • • 



300 INTRODUCTION TO MATHEMATICAL PROBABILITY [Chap. XI 


convergent for sufficiently small u. Hence 




l+Ju* - , . 
nim» - (m) 


- niHrj^ ^ - ntt»a(M) 


where g{u) is a bounded function of u, u being contained in a certain interval (~r, r). 
By substituting 


here, we easily conclude that 


tends uniformly to the limit 


II « “7= 

VBn 

e 


V* 



in the interval —I ^ v ^ I while 


e 



remains there uniformly bounded. Since, as can easily be seen, A and B can be 
represented by power series 

A = 1 + aiu + oju* 4- • . . 

B — —aiu — oju* — • • • 


A tends uniformly to 1 and B tends uniformly to 0. Hence, finally, v>n(v) in any fixed 

_?! 

interval tends uniformly to c ^. It suffices to apply the fundamental 

lemma to conclude that the probability of the inequality 


tends uniformly to the limit 


if tn tends to t 


— nPn < fnVBn 


1 -- 


14-5 

Since is asymptotic to npq- -- and p» differs from p by a quantity of the order 

1 — d 


1/n, the inequality 


can be written in the form 


, /rn 


m — np < t 


m ~ nPn < iny/Wn 


with tn tending to i, whence, using the above established result, the following theorem 
due to Markoff can be derived: 
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Theorem. For a simple chain the probability of the inequalities 


t 


inpq < m — np < 


‘Wr? 


mpq 


tends to the limit 


1 




as n 00 . 

Example 2. Considering an indefinite series of Bernoullian trials with the prob> 
ability p for an event A to occur, we can regard pairs of consecutive trials 1 and 2, 
2 and 3, 3 and 4, and so on, as forming a new series of trials which may produce an 
event E consisting of two successive occurrences oi A {E = A A) or an event F opposite 
to E {F = ABf BA, BB). With respect to E the trials of the new series are no longer 
independent. Let m be the number of occurrences of ^ in n trials. Then 


E{m — np*) —- 0 

and 

Bn — E{m — np*)* = np*q(l + 3p) — 2p*g 
as was shown in Chap. XI, Sec. 6. 

Let Pm.« be the probability of exactly m occurrences of in a series of n trials. 
Evidently 

P m,f» ~ Ain,n ~l“ Bm.n 


where Am.n and B«,„ are the probabilities of m occurrences of E when the Bernoullian 
series of n + 1 trials ends with A or B, respectively. By an easy application of the 
theorems of total and compound probabilities we get 

~ Am—l,nP “I" Bm.nP 

Bm.«+1 = Am.ng + 

Corresponding to these relations the generating functions 


9.(«) = 5) Mt) = 

m—0 m—0 

satisfy the following equations in finite differences: 

tfn+l = ptdn -f P^n 

^f»+l = qffn + q^n 

holding even for n = 0 if we set Oo = p, = Q- Hence, it follows that ^n(<) and 
satisfy the same equations of the second order 

Sn+t - (pf + + pqit - 1)«» = 0 

“ (pt + Q)^n^l + Pg(f “■ l)^n = 0 


««(0 = ®ii(0 ^• 

m —0 


and so does their sum 
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Thus, to determine tonii) we have the equation 

and the initial conditions 


Wo ** 1| «i — 1 — p* + P^» 
The general expression of w«(() is 
««(0 * Af * + 

where fi and ft are roots of the equation 

f* - f - p« - l)(f - q) 

and 


-ft + 1 4- pHt - 1). 


B = 


1 - pHt - 1 ) 


f 1 - f I f 1 - f * 

If fi is the root which for < * 1 reduces to 1, we easily find the following series 


log f. - p*« - 1) + P*< P* + ^P9\ t _ 1 ), + 


or, setting t » e*** and supposing u sufficiently small, 


log f 1 = tp*u — 


pW+iP)„.+ 


As to il and B, they can be developed into series of the form 

A « 1 4- cu* 4- • • • 

B *** —cu* -j- • • • 


Hence reasoning in the same manner as in Example 1, we can conclude that the 
characteristic function 

npH . tv 

Tn(v) = e yfBnWnifiVB:) 

of the variable 


m ~ np* 

Vb, 

-•! 

tends to the limit e ^uniformly in any finite and fixed interval Refer¬ 

ring, finally, to the fundamental lemma, we reach the following conclusion: The 
probability of the inequalities 


Uy/ np*g(l 4- 3p) < m — np* < tty/np*qil -f 3p) 
tends uniformly (with respect to ti and U) to the limit 



2*! 


as n 00. 
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Problems for Solution 

1. Consider a series of independent variables xi, z*, Xi, . . . where in general 
I* (fc = 1, 2, 3, . . . ) can have only two values and -A:" each with the probability 
Show that the limit theorem holds for the variables thus defined if a > —3^, 
but the law of large numbers holds only if a < 

SolvHon. Evidently 

E{xk) * 0, E(xl) = E|x*l« = k^. 

From Euler’s formula (Appendix I) we derive two asymptotic expressions 


Hence 


yjla+l 

+ 2 *« + . . . + 

-p 1 


l*a 2»« -b . . . -b n*« 


3a -f 1 


(2a 4- 1)* 

’ 3a4-l 


Wfi —> 0 


so that the limit theorem holds. For a — 3^ the probability of the inequalities 


Xl + X* 4* • • • + Xu ^ 
— € < - < € 


tends to the limit 




V 2 O 

e-i-'du = 


J'e-'‘'du 


and the law of large numbers does not hold. 

2. Let m» be the number of successes in i Bemoullian trials with the probability p. 
Show that the limit theorem holds for variables 


Si 


ntj — sp 
\/Tpq 


i = 1, 2, 


n 


but the law of large numbers docs not hold (Bernstein). 
Hint: 


+•••+«» = (pg) * 


[( 


1 +- 7 = + 

V 2 




\/i 




where Xi, Xa, . . . Xn are independent variables with two values q and — p associated 
in the customary way with trials 1,2, . . . n. 

3. Consider an infinite sequence of independent variables Xi, Xt, Xs, • . . where 
Xk can have three values 


0, (log &)#», -(log ky 
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with the corresponding probabilities 


(k + a) {log {k + (k + a) {log (k + (k + a) (log (k + «)}#» 
a being a sufficiently large constant. Moreover, /i and p satisfy the inequality 

2fi — p + 1 >0. 

Show (a) that Liapounoff’s condition is satisfied when p < 1 and hence the limit 
theorem holds; (b) that this condition is not satisfied if p ^ 1 and at the same time the 
limit theorem fails at least for p > 1. 

Solution, a. By using Euler’s formula we find 


(2p + 1 - p) |(p_l) 


Hence the first part is answered. 

6. The probability of the inequality 


+ Xj + • • • -+* Xfi ^ 0 


is less than 


^2^ 


+ «) Hog (fc + «))«> 


and this, in case p > 1, is less than 


--(log 

p - 1 

Hence, the probability of the equality 

H- Xa + • • • + Xn = 0 


remains always >1-- (log and the limit theorem cannot hold. Note 

p - 1 

that —> 00 because 2 m — p 4- 1 >0. 

4 . Prove the asymptotic formula 


i+” + r:i + 


1 -2 


2®" 


n being a large integer. 

Hint: Apply IJapounoff's theorem to n variables distributed according to Poisson’s 
law with parameter 1. 

6. By resorting to the fundamental lemma, prove the following theorem due t-o 
Markoff: If for a variable Sn with the mean ~ 0 and the standard deviation » 1 

lim E(s!^) = - ^ - I 

V 2irj _ « 
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for any given » 3, 4, 5, . . . , then the probability of the inequality «« < t tends 
to the limit 

4=f e-*-du. 


6. In many special cases the limit of the error term can be considerably lower than 
that given in Sec. 6. For instance, if variables Xi, xt, . . . Xn are identical and uni¬ 
formly distributed in the interval — K the probability F«(0 of the inequality 


X\ Xi ‘ ‘ 


differs from 



by less (in absolute value) than 



e ^ du 


+ 


12 
- e 

TT^n 


ir*n 

24 


the last two terms being completely negligible for somewhat large n. 
Indication of the Proof. First establish the inequalities 


sin ip 
- - < e 

*p 


6 


^ g 6 135 


for 0 ^ ^ ^ t/2. 


Further, represent Fnit) by the integral 



and split it into two integrals taken between 0 and Tr\^/\/T2 and ir\/n/y/\2 and 

+ 

7. Supposing again that x\, Xi, . . . Xn are identical and uniformly distributed in 

the interval 14, pro /e that for n ^ 2 

+ —0 < 0 < 1 . 

(K) V n 

8. Let 8n be a variable with the mean = 0 and standard deviation =1. If its 
characteristic function tpnii) tends to as n —» w uniformly in any finite interval 

show that 


E\xi Xi -\- 


4 - Xn\ 




Hint: 
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9. If independent variables xi, Xa, . . . Xn with means »0 satisfy Liapounoff’s 
condition, prove that 


E\xi -f- a?* 



10. Show that for a simple chain of trials 


, l2npg 1 * 1 - 6 
£|m -np|~^——. 

p being the mean probability in infinite series of trials and 5 = p' — p". 

11. A series of dependent trials can be illustrated by the following um scheme: 
Two urns, 1 and 2, contain white and black balls in such proportions that the prob¬ 
ability of drawing a white ball from 1 is p, whereas the probability of drawing a 
white ball from 2 is g = 1 — p. Whenever a ball taken from an um is white, the 
next ball is taken from the same um, but if it is black, the next ball is drawn from the 
other urn. The um at the first drawing is selected by lot, the probabilities of select¬ 
ing the first or the second urn being given. Evidently the course of trials is deter¬ 
mined by these rules without any ambiguity. Let m denote the number of white balls 
obtained in n drawings and let 


a = p* + g*. 

Show that the probability of the inequality 

m — net < t\/La{\ — at)n; 
approaches the limit 

r 


2(1 - 3pg) 

1 - 2p« 




Indication of the Proof. Let 


p(l) , p(*) . p(l) , p'A) 

* tii,n> * m.n^ * * m.fi 

be the probabilities of having m white balls in n trials when (a) the last ball is white 
and from urn 1; (&) the last ball is white and from urn 2; (c) the last ball is black and 
from urn 1; and (d) the last ball is black and from um 2. The sum 

Pm,. - Pi'l 

represents the probability of having exactly m white balls in n trials, 
functions of probabilities P^l^ satisfy the following equations 


The generating 






whence it can be shown that they all, as well as their sum—the generating function of 
Pm.n —satisfy the same equation of the second order 

- Un+l + pg(<* - l)s« * 0. 
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Setting t *= one of the characteristic roots will be given by 

(l-2p9)tu-4p9(l-3p<j)^* + ■ . . 

e ^ 

for small u, while the other root tends to 0 as u —► 0. The final conclusion can now 
be reached in the same way as in Examples 1 and 2, pages 297 and 301. 
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CHAPTER XV 


NORMAL DISTRIBUTION IN TWO DIMENSIONS. LIMIT 
THEOREM FOR SUMS OF INDEPENDENT VECTORS. 
ORIGIN OF NORMAL CORRELATION 

1. The concept of normal distribution can easily be extended to two 
and more variables. Since the extension to more than two variables 
does not involve new ideas, we shall confine ourselves to the case of 
two-dimensional normal distribution. 

Two variables, x, y, are said to be normally distributed if for them 
the density of probability has the form 

er¥> 

where 

<p = ox* + 2hxy + cy* + 2dx 2ey -f / 

is a quadratic function of x, y becoming positive and infinitely large 
together with jxj + \y\. This requirement is fulfilled if, and only if, 

ox* + 2hxy -f cy* 

is a positive quadratic form. The necessary and sufficient conditions 
for this are: 

o > 0; oc — 6* = A > 0. 

Since A > 0 (even a milder requirement A ^ 0 suffices), constants Xo, yo 
can be found so that 

(p = a{x - Xo)* + 26(x - xo)(2/ - yo) + c{y - i/o)* + g 

identically in x, y. It follows that the density of probability e~^ may be 
presented thus: 

g—^ = J^£—a(x—X9)^2b(x—X9)iv—yo)—c(.v~Vo)* 


The expression in the right member depends on six parameters K; 
a, bf c] Xo, 2 / 0 . But the requirement 




reduces the number of independent parameters to five. We can take 
o, 6, c; Xo, yo for independent parameters and determine K by the condition 


Q—a{x—X9)^7b(xr-X9){v-V9)-eiv-v)^^Xdy = 1 
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which, by introducing new variables 


i = X - xq, 1? = 2/ “ yo 


can be exhibited thus 


irJ**^J** = 1. 


To evaluate this and similar double integrals we observe that the positive 
quadratic form 

af* + 26{i; + crj^ 

can be presented in infinitely many ways as a sum of two squares 

+ cri^ = («{ + /3r,)2 + + 8ri)\ 

whence 

o = a* + 7 *; c = + 8^; b — aP + yd 

and 

(aS Pyy = A. 

By changing the signs of a and p if necessary, we can always suppose 

ad — Py = 

Now we take 

u = a ^ + pri ; V — y ^ + drj 

for new variables of integration. Since the Jacobian of u, v with respect 
to {, rj is -\/A| tbe Jacobian of J, with respect to u, t; will be 1 /\/a and, 
by the known rules 


~ = l, K = 

-y/A IT 

That is, the general expression for the density of probability in two- 
dimensional normal distribution is 


^ac - 6* 


^a( a»-*o) *-26(»-lo) (V-Vo)—e(lA-V*) 


2. Parameters Xo, yo represent the mean values of variables x, y. 
To prove this, let us consider 


B{x — Xo) = 




X— *o) •- 2fc(»- *•) (v-V*)— *dxdy. 
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To evaluate the double integral, we can express x and y through new 
variables w, v introduced in the preceding section. We have 


X Xa = 


hu — 


y -yo = 


— yu + av 


E(x — Xq) = 


-M'S 


{Su — Pv)€r^^**dvdv = 0 , 


whence 

Eix) = XOy 

and similarly 

E(y) = 2 / 0 . 

3. Having found the meaning of Xo, yo we may consider instead of a:, 2/, 
variables x — Xo, y -- yo whose mean values = 0. Denoting these new 
variables by x, y again the expression of the density of probability for 
X, y will be: 

Vac - 62____ 


V ^ ^ C-ax>-2bTU-cu* 

IT 

It contains only three parameters, a, 6, c. To find the intrinsic meaning 
of a, 6, c let us consider the mathematical expectation of (x + Xy)^ 
where X is an arbitrary constant. We have 


E(x + Xyy = 


v^r- 






or, introducing w, v defined as in Sec. 1 as new variables of integration. 


E{x + Xy)^ = J [(^ ” X 7 )*ii* + 2{6 — Xy){—fi + Xa)ut; + 

+ 03 —Xa)hf^]er^^^'dudv = 

= ^ J j [(« - X 7 )* + 03 - Xa)»]u‘e-“^*yudi» =. 


5» + /3‘ o,a^ + 7« . + 

2A 2A ^ 2A ■ 


whence 


«* + /3* = c, 7 * + a‘ = o, afi + yS = b, 


B(x‘) + 2XB(xy) + = ^ - 2 xA + x^, 


and since X is arbitrary 


lS(flf) - -4 B(»’) - H- 
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On the other hand, if <ri, <r 2 , and r are respectively standard deviations 
of Xf y and their correlation coefficient, we have 


E(x^) = aj, E{xy) = rcriffj, E{y^) = 


Hence 

and 

or 

Finally, 
a = 


® 8 


2A 




S = “Airier* 


ac — 

“ 4^2 • = 


2A = 


2a\al{l - r») 


2crJ(l - r^y 


h = - 
\/A = 


‘2<ri<r2(l - r^y 
1 

2<ri<r2\/l — 


2(r|(l - r^) 


With these values for a, 5, c, and \/a the density of probability can 
be presented as follows: 


e 2(1 




2ir<ri<7-2\/l — r* 

and the probability for a point x, y to belong to a given domain D will be 
expressed by the double integral 


2ir<Tio’2\/l 

extended over D. 

4. Curves 


^^===CCe aa-»’*)[(r,) 

(O) 


(iY - 2,£ £ 

2(1 — r*)LVi/ a I at \at/ J 


are evidently similar and similarly placed ellipses with the common 
center at the origin. For obvious reasons they are caMed ellipses of 
equal probability. The area of an ellipse corresponding to a given value 
of I (ellipse 1) is 


= %[la\at^y\ — rK 

V^A 
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whence the area of an infinitesimal ring between ellipses I and I + dl 
has the expression 

2ir<ri(ri\/l — r^dl. 

The infinitesimal probability for a point a;, y to lie in that ring is 
expressed by 

e-^dL 

Finally, by integrating this expression between limits h and h > h, we 
find 

g-/i — g-ii 

as the expression of the probability for x, y to belong to the ring between 
two ellipses h and h- If = 0 and h = Z, 

1 - e-* 

gives the probability for a:, y to belong to the ellipse 1. 

If n numbers I, Zi, Z 2 , . . . Zn-i are determined by the conditions 

1 — e“* = 6""* — =•••== — e"^**-* = — 

n + 1 

the whole plane is divided into n + 1 regions of equal probability: 
namely, the interior of the ellipse Z, rings between Z, Zi;Zi,Z 2 ; . . . ln- 2 , In-i 
and, finally, part of the plane outside of the ellipse Zn-i. 

6. To find the distribution function of the variable x (without any 
regard to y), we must take for D the domain 

— oo<x<Z; --oo<2/<+oo. 

As the integral 


1 rt _£L r •_*!__ 1 /•< _*L 

=- ^- I e ^^'dx • I e 2 (i-r»)c ^2 =-= I e 2»iyx, 

2Tcri\/l — - J-- 


we see that the probability of the inequality 


is expressed by 


1 /•< --*L 

^ e ^^'dx. 


<ri\/ 2 ir, 


Similarly, the probability of the inequality 


y <t 
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is 

j ^"dy. 

Thus, if two variables a:, y are normally distributed with their 
means = 0, each one of them taken separately has a normal distribution 
of probability with the common mean 0 and the respective standard 
deviations <ri and Variables x and y are not independent except when 
r = 0. For if they were independent the probability of the point 
X, y belonging to an infinitesimal rectangle 

t < X < t dt) T < y < T di 

would be 

whereas it is ^ 

- L.. p~2(i-r«)[(;i) lidtdr 

2ir<ri(r2V^l — r* * 

and these expressions are different unless r = 0. Thus, except for r = 0, 
normally distributed variables are necessarily dependent in the sense 
of the theory of probability. Dependent variables are often called 
** correlated variables.” In particular, variables are said to be in ‘‘ normal 
correlation” when they are normally distributed. 

6. The probability of simultaneous inequalities 

X < X < X', y < t 

is represented by the repeated integral 

— r^Jx J-« 

while 

1 rx' __£L 
—I e 2-iyx 
criV2irJx 

is the probability that x will be contained between X and X'. Hence 
(Chap. XII, Sec. 10) the ratio 


1 J 


n 1 r V 

_,e 

C2y/2ir{l — r*) 

fX' _^ 

fx e ^'^dx 


can be considered as the probability of the inequality 

y <t 
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it being known that x is contained between X and X\ Considering X' as 
variable and converging to X the above ratio evidently tends to the 
limit 



5 2ir,*(l-r*)[*' . 


dy 


which can be considered as the distribution function of y when x has a 
fixed value X. Hence, y for x = X has a normal distribution with the 
standard deviation 


— r* 

and the mean 


Y = r^X. 

Interpreted geometrically, this equation represents the so-called 
*^line of regression'' of y on x. 

In a similar way, we conclude that for y = F the distribution of x 
is normal with the standard deviation 


ciy/l ~ r* 

and the mean 


X = r^F. 
cr2 

This equation represents the line of regression of x on y, 

LIMIT THEOREM FOR SUMS OP INDEPENDENT VECTORS 

7. So far normal distribution in two dimensions has been considered 
abstractly without indication of its natural origin. One-dimensional 
normal distribution may be considered as a limiting case of probability 
distributions of sums of independent variables. In the same manner 
two-dimensional normal distribution or normal correlation appears as a 
limit of probability distributions of sums of independent vectors. 

Two series of stochastic variables 


• • • Xfi 

VU 2^2, .. . Fn 

define n stochastic vectors Vi, V 2 , . . . v„ so that Xi, yi represent com¬ 
ponents of V, on two fixed coordinate axes. If 

E(x^ = a{ E{yi) « hi. 
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the vector a,* with the components a<, 6< is called the mean value of v<. 
Evidently the mean value of 

V = Vi + Vj + • - - + Vn 

is represented by the vector 

a = ai + a 2 + • * • + a» 

and that of v — a is a vanishing vector. Without loss of generality 
we may assume at the outset that 

E(xi) = E(yi) =0; t = 1, 2, . . . n, 

in which case E(v) = 0. Vectors Vi, V 2 , . . . Vn are said to be inde¬ 
pendent if variables Xij yi are independent of the rest of the variables 
Xif yi where j ^ i. 

In what follows we shall deal exclusively with independent vectors. 

8. As before, let Xk^ yk be components of the vector 

Vk(k = 1, 2, . . . w). 

Then 

X = XI + X2 + • • • + Xn 

V = yi + 2/2 + • • • + 2/n 

will be the components of the sum 

V = Vi + V 2 + • • • + v„. 

If 

E{x,) = E{y,) = 0 

E(xD = 6*, E(yl) = c*, E{x,yk) = d, 

then 

E{X) =0, E(Y) =0 
E{X^) = 61 + 62 + • • • + = Bn 

E{Y^) = Cl + C2 + • • • + c„ = Cn _ 

E{XY) = d, + dj + • • • + = r^VBnVCn 

because 

B(xiT/,) =0 if j 9 ^ ij 

variables Xi and y, being independent. 

Let us introduce instead of variables Xk, yk{k == 1, 2, . . . n) new 
variables 

y Xk ^ Vk 

y/K’ vc; 
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and correspondingly 

X Y 

S = -^=1 O' = —== 

instead of X, Y. We shall have: 

E(u) = mvk) = 0 

mi) = mi) = g 

and 

E(8) = B(a) = 0 
E(8^) = E(c^) = 1 
E(8a) = Tn. 

The quantity rn, the correlation coefficient of s and a, is in absolute value 
^1. We define 

</>(Uf v) = 

as the characteristic function of the vector s, <r. Evidently </>(u, 0) and 
0(0, v) are respectively the characteristic functions of 8 and <r. Since 

ss: . g*(u{rfviji) . , . g»(u(a+vija) 


and the factors in the right-hand member represent independent varia¬ 
bles, we shall have 

0(it, v) = 

9. For what follows it is very important to investigate the behavior 
of 4>(u, v) when n increases indefinitely while w, v do not exceed an 
arbitrary but fixed number I in absolute value. 

Let 

E\yk\* = Qi 

and 

/i +/« + •••+/■ » _ 

Bi 

gl + gl + ■ • • + gn _ 

c* 

If ci)n and rjn tend to 0 as n «, we shall have 

(1) |0(tt, V) - e-^iu^^2rnuv+vt^ < g4i»(«n+.|.) - 1 

provided 


1«| ^ 1, Irl ^ I 
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and n is so large as to make 


+ vi) < 1. 


e«»f*+«i.) = 1 + t(uf* + vtik) — 2 («f* + + 


+ 101 <1, 


we shall have: 


‘ - A ”+ 


+ + Wkl^; W < 1. 


On the other hand, 


1 - ~u 

* rk ^ 


2B»“ 2Vb;:^. 2C,' 




+ ^iE(.ui, + w,m io"i < 1 


bk 2dk Ck 

£?(€«“{.+>”*>) = e 2®'“’ 2v^"’ + L[E{uik + t;,*)*]* + 


+ Q^lu^k + Vrik\^, 


Furthermore, 


Eiu^k + vrik)^ g + 2wjl??i + rjl) < 1 


because 


£(«) = :^ < “i- < "i’t 


[E{u^k + vriky]^ < [E{u(k + vrik^V = + vrik\^ 

E\u^k 4 - vrjk\^ ^ 

Taking into account these various inequalities, we may write 


^(g<(«e*+v,,)) g 2Ba“ 2Vb;:^ ' zc* ^ ^ 
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where 

Finally, 

1 ^) = + crOd + cr,) . . . (1 + crj 

and 


\<l>(u, v) - <; gM+|»,|+.. .+|^| « 1 < e4l»(«,+,H) - 1 


as was stated. 

10. Theorem. Let P denote the probability of simultaneous inequalities 
U ^ 8 < ti] tq ^ a < Ti. 

Provided fn remains less than a fixed number a < \ in absolute value and 
the above introduced quarUities ojn, Vn tend toO as n—^ «, P can be expressed 
as 


P - 


2ir\/l 


k=rrr^ 

1 f'njtt Jro 


+ A. 


where An tends to 0 uniformly in to, U) to, ti. 

If, in addition, r^ itself tends to the limit r(|r| < 1)P will tend uniformly 
to 


2irVr 


f^fT 

1 ^ <0 %/ro 




Proof, a. In trying to extend Liapounoff^s proof to the present case 
we introduce an auxiliary quantity IT defined as 

Using the inequality 

—= I e~''dt < for X > 0, 

VirJ* 2 

one can easily derive the following inequalities: 


( 2 ) 


1 P*' / u — 1 \» rri / v — o \* 

e V * ; rfu • e \ I dv 
h^Jto Jro 


< l(e-C^)* + e-(^ + r(V)‘ + e-(V)') 
2 
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to S 8 <tl‘, TO ^ <r < Tl, 


(4) duJ^VC’*’)*< l(e (">') +«“(V)'4. 

if at least one of the inequalities (3) is not fulfilled. From the definition 
of n, P and from (2) and (4) it follows that 

|P - nj < JbCc'CV) + + e~(^) + e“(^)’). 

But referring to (1) and setting 

e4i.(u,.+,.) _ 1 = an(l) 

WO have by virtue of the developments in Chap. XIV, Sec. 3, 

(5) If* - n| < 2a,(0 + AV2 + ‘ • 

6. Replacing /i, ri by variable quantities /, t and taking the second 
derivative of n with respect to t and r, we get 




dtdr \hhr 


On the other hand 


IT 4irv-«j~ 


** ——(u*+l») 


whence 


^ v)dud«. 

Here we substitute 

*(u, v) = e-+ g{u, v). 

For all real u, v 

\g{.u, »)| g 2. 

If |u| ^ I, |v| ^ I, where f is an arbitrarily fixed number, and n is large 
enough, we have 


|9(m, »)| ^ 
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Hence, the double integral 


^J*4 (« +» v)dudv 


4t*. 

extended over the region outside of the square |w| ^ /, |f;| ^ Z is less than 

**/« 




. 


e ^ rdr < 


A* 


in absolute value. The same double integral extended over the square 
|u| ^ Z, |t;| ^ Z is less than 

in absolute value. Thus, referring to (6) 


dm 

dtdr 






and 


IBl < Ja.(i) + 


Now 


and 




= + |X|<1 


f 


gr-J(«i+2r,«p+*i)(^J + v^)dudv = 


A* 


A* 


4t( 1 — r*)* 4ir(l — a*)* 


Hence 



g-|(u«+2r»ut>+v«)g^t(fu+ri>)^||^j; R/ 


and 


ifi'i < 5a»(i) 


(«)« 

4 


A* 


4t{1 - a*)«‘ 


By transformation to new variables 


{ » M + r,!;; ij = »-\/l - r* 
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the foregoing double integral becomes 

1 C‘ f- 


1 


=e 2(1-r.') 


SO that finally 


dqi ^ ^ 

dtdr 2irVl - r* 


rn* = 

_ _1_ 

2t\/1 — 1 




«*-2r»<T+r») 


Integrating this expression with respect to i and t between limits ^o, U 
and To, Ti, we get: 


(7) 


n = 


2irVl 


1 r- r 

1 — •i*’* 


g 2(1-r,*) ^ p 


where 

(8) |p| < (ti — to)(Ti — To) 


(«)* 


Vn(0 + ! 




+ 




4t( 1 - a2)2 


Hence combining inequality (5) with (7) and (8), 

'•ti _1 


27 r\/l 


HJ'T 


where 
1A„| < 


g 2(1-r»*) 


(«)« 

1^2 + ^{h — to){Ti — To) ja«(0 + J “I" 


+ (^1 — W(^l 


- r„)| + hV2 


+ 


(fl — to)(Tl — To)^* 
3 

47r(l - a2)2 

Considering U, h; to, ti as variable and denoting an arbitrarily large 
number by L, we shall assume at first that the rectangle D 

to ^ s ^ h; To ^ a ^ Ti 

is completely contained in the square Q: 

|s| ^ L, |o-| ^ L. 

Then, taking h = we shall have 


|A„| < (2 + + le' 


V A=l ^ + 4L"j + V 2 I ^ 


Vv^ 


»(1 - a *)5 
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Given an arbitrary positive number c, we take I so large as to have 


U 


^ + 4lA + \/2l i + 




L*i-> 


t( 1 - a»)» 


-s < 2*- 


After that, since a»(0 0 as n —♦ « (for a fixed 1) we can find a number 

no(e) so that 




for n > no(€). Finally, we shall have 

|A.l < t 


as soon as n > rioie ); that is. An tends to 0 uniformly in any rectangle D 
contained in the square Q with an arbitrarily large side 2L. 

c. To prove that An tends to 0 uniformly no matter what are <o, <i; 
To, Ti we observe that the integral 


1 

2T\/r^ 



1 

2(1 


•(4» —2rii#r-|"T*) 


dldr 


extended over the area outside of Q becomes infinitesimal as L —> ». 
Accordingly, we take L so large as to make this integral <€/2 (no matter 
what n is) and in addition to have L”' < c/4. The number L selected 
according to these requirements will be kept fixed. 

Let D' represent that part of D which is inside Q, the remaining part or 
parts (if there are any) being Z>". Let P' and P" denote the probabilities 
that the point s, <r shall be contained in Z>' or Z>", respectively. Also, 
let J' and J" be the integrals 


_ 1 

2irVr=' 



1 

2(1-rn*) 


l<*-2r,<r+r*) 


dtdr 


extended over D' and D", respectively. By what has been proved, given 
c > 0 a number no(€) can be found so that 


for n > no(€). Now 


IP' - J'l < € 


P = P' + P"; J == + J", 

whence 

IP - J1 < € + P" + J" 

for n > no(e). Since by Tshebysheff's lemma (Chap. X, Sec. 1) the 
probability of either one of the inequalities 
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> L or kl > 
is less than 1/L, we shall have 


P" 



Also, 




whence 

\P -J\< 2€ 

for n > no(e); that is, the difference 


P 


_^ 

2irVr^ 



1 

2(1-ra») 




tends to 0 uniformly, no matter what <o, t\; ro, ti are. 

Finally, the last statement of the theorem appears as almost evident 
and does not require an elaborate proof, 

11. The theorem just proved concerns the asymptotic behavior of 
the probability P of simultaneous inequalities 


to ^ 8 < h] To ^ <T < Ti 


which, due to the definition of s and c, are equivalent to the inequalities 
toy/Bn ^ a:i + a :2 + • • * + Zn < ti\/^ 

rQy/C~n ^ J/l + 1/2 + * • * + 1/n < Ti\/C^. 

From the geometrical standpoint the above domain of 5, <r is a rec¬ 
tangle. But the theorem can be extended to the case of any given 
domain R for the point 8, <r. It is hardly necessary to enter into details 
of the proof based on the definition of a double integral. It suffices to 
state the theorem itself: 

Fundamental Theorem. The probability for the point (s, a) to be 
located in a given domain R can be represented^ for large n, by the integral 


_ 1 _ 

2T\/n^ 





extended over with an error which tende uniformly to 0 as n becomes 
infinite, provided 


Wn —► 0, 


Vn -♦ 0, 
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while for all n 

|rn| < a < 1. 

In less precise terms we may say that under very general conditions 
the probability distribution of the components of a vector which is the 
sum of a great many independent vectors will be nearly normal. 

The first rigorous proof of the limit theorem for sums of independent 
vectors was published by S. Bernstein in 1926. Like the proof developed 
here it proceeds on the same lines as Liapounoff^s proof for sums of 
independent variables. Moreover, Bernstein has shown that the limit 
theorem may hold even in case of dependent vectors when certain addi¬ 
tional conditions are fulfilled. 

12. A good illustration of the fundamental theorem is afforded by 
series of independent trials with three alternatives, Ej F, G. For the 
sake of simplicity we shall assume that probabilities of E, F, G arc 
p, g, r in all trials. Naturally 

p + q + r = 1. 

In the usual way, we associate with these trials triads of variables 
Viy a = 1, 2, 3, ... ) 

so that 

= 1 or 0 according as E occurs or fails at the fth trial; 

= 1 or 0 according as F occurs or fails at the ith trial; 

Zi = 1 or 0 according as G occurs or fails at the ith trial. 

Evidently 

E{xd = E{xf) = p 

EiVi) = E{y}) = q 

so that vectors v< with components 

ii = Xi - p, Vi = Vi - q 

have their means = 0. The independence of trials involves the inde¬ 
pendence of vectors Vi, V 2 , . . . Vn. Hence we can apply the prec(iding 
considerations to the vector 

V = Vi -t- V2 + • • * + Vn 

with the components 

= fl + {2 + • * * + fn 
F = 171 -f- + • * * + Vn- 

We have 

Bn = EiX^) = np(l - p); Cn = E{Y^) = nq{l - q). 
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Moreover, 


Eiiirii) = Eixiyt) - pq= -pq 


EiXY) = rnVBlVc: = -r 


whence 


Vp9(i - p)(i - q) 

The quantities denoted by ft, Qk in Sec. 9 are in our case 

/t = E\ik\^ = p(l - p)» + (1 - p)p* 

gk = = ?(1 - qy + (1 - ?)g» 


Hence 


^ p(l - p)» + (1 - p)p^ ^ g(l - g )’ + (1 - q)q* 

n‘p«(l - p)* ’ ‘ n‘9«(l - 9)* ’ 


and the conditions 


Wn 0, 


are satisfied. The fundamental theorem, therefore, can be applied. 
If k, Ij m are the respective frequencies of events FyG in n trials, the 
quantities X and Y represent the discrepancies 

X = fc — np, = I — nq. 

Introducing the third discrepancy 


V = m — nr 

we shall have 

\ + fji + y = 0 

so that y is determined when X and p are given. The last two quantities, 
however, may have various values depending on chance. Concerning 
them the following statement follows from the fundamental theorem: 

Theorem. The prohahility that discrepancies X, n in n trials shall 
simultaneously satisfy the inequalities 

aoVn < X < ai\/n; Po\/n < /x < i^iVn 
tends uniformlyj with indefinitely increasing n, to the limit 
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where^ to have symmetrical notaiion^ y is a variable defined hy 

a + P + y =0. 

On account of symmetry, perfectly similar statements can be made in 
regard to any two pairs of discrepancies X, Hy v. 

Since the fundamental theorem and its proof can be extended without 
any difficulty to vectors of more than two dimensions, we shall have 
in the case of trials with more than three alternatives a result perfectly 
analogous to the last theorem. 

Theorem. Each of n independent trials admits of k altemalives E\y 
Ety . , . Ek the probabilities and the frequencies of which respectively are 
Ply ps, . . . pk and mi, ms, . . . mjb. The probability that the discrep¬ 
ancies mi — npi(i = 1, 2, . . . k — 1) should satisfy simultaneously the 
inequalities 

a*\/n < mi — npi < Piy/n 

tends uniformlyy with indefinitely increasing n, to the limit 

k 

1 

1 Pfik.i 2^ Pi 

--I * • • I ^ ‘ 

(2ir)~ y/pipt . • • 

where 

<* = —(^1 + + • * • + 

From this theorem, by resorting to the definition of a multiple integral, 
we may deduce an important corollary; Let Pn denote the probability of the 
inequality 

{mi - npxY ^ (ms - np^p ^ (m^ - npk)^ ^ , 

npi npt npk 

TheUy as n tends to infinity Pn tends to the limit 


. . . e 2 Vpi^p»^ ^v^fdtidh • • • dtk-i 

Vk ^ 


k-\ _ 

(2ir) * VpiPt 

where the integration is extended over the {k — 1) dimensional ellipsoid 

Ik 


<P 


t? tl 

= ^ + 

Pi P2 


+ ~ ^ X*. 
Vk 


It is easy to see that the determinant of the quadratic form ^ in 
{k — 1) variables is (pips • • • Pfc)“'* Hence, by a proper linear trans¬ 
formation the above integral reduces to 








dvidvi 


dvk- 
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the domain of integration being »* + t;| + • • • + vl_i ^ x*- But 
this multiple integral, as will be shown in Chap. XVI, Sec. 1, can be 
1 csduced to a simple integral 


Lzl 

2ir 2 


Thus 






lim P, = -jZT 

2 * r| 






The probability Qn = 1 — of the opposite inequality 

(wi - npi)» (m, - np,)« .... (m - npQ* ^ 

'' npi npt npk * 


tends to the limit 


1 r “ -iu. 
!lz1 a — i\ I ^ ^ ^ 
2 * 




and for large n we have an approximate formula 


Qn = 


1 


lc~3 

2 2 r 



I 


e 2 u^-^dUy 


but the degree of approximation remains unknown. In practice, to 
test whether the observed deviations of frequencies from their expected 
values are significant, the value of the sum (A), say x*, is found; then 
by the above approximate formula the probability that the sum (A) will 
be greater than x* is computed. If this probability is very small, then 
the obtained system of deviations is significantly different from what 
could be expected as a result of chance alone. The lack of information 
as to the error incurred by using an approximate expression of Qn renders 
the application of this ‘‘x*-test^' devised by Pearson somewhat dubious. 


Hypothetical Explanation of Empirically Verified Cases of 
Normal Correlation 

13. Normal distribution in two dimensions plays an important part 
in target practice. It is generally assumed on the basis of varied evidence 
collected in actual target practice that points of a target hit by projectiles 
are scattered in a manner suggesting normal distribution. By referring 
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points hit by projectiles to a fixed coordinate system on the target, it is 
possible from their coordinates to find approximately (provided the 
number of shots is large) the elements of ellipses of equal probability. 
Dividing the surface of the target into regions of equal probabilities as 
described in Sec. 4, and counting the actual number of hits in each 
region, the resulting numbers in many reported instances are nearly 
equal. That and the agreement with other criteria are generally con¬ 
sidered as evidence in favor of assuming the probability in target 
practice to be normally distributed. 

Two-dimensional normal distribution or normal correlation has been 
found to exist between measurable attributes, such as the length of the 
body and weight of living organisms. Attributes like statures of parents 
and their descendants, according to Galton, again show evidence of 
normal correlation. 

Facing such a variety of facts pointing to the existence of normal 
correlation, one is tempted to account for it by some more or less plausible 
hypothesis. It is generally assumed that deviations of two magnitudes 
from their mean values are caused by the combined action of a great 
many independent causes, each affecting both magnitudes in a very small 
degree. Clearly, the resulting deviations under such circumstances may 
be regarded as components of the sum of a great many independent 
vectors. Then, to explain the existence of normal correlation, reference 
is made to the fundamental theorem in Sec. 11. 


Problems for Solution 

1. Let p denote the probability that two normally distributed variables (with 
means = 0) will have values of opposite signs. Show that between p and the corre¬ 
lation coefficient r the following relation holds: 


r = cos pv. 


2. Variables i, y (with the means = 0) are normally distributed. Show that the 
probability for the point x, y to be located in an ellipse 


X y 


2 /* 


- 2r-~ = I 

<Ti<ri <r« 


is greater than the probability corresponding to any other domain of the same area. 

3. Three dice colored in white, red, and blue are tossed simultaneously n times. 
Let X and Y represent the total number of points on pairs: white, red and white, blue 
Show that the probability of simultaneous inequalities 


7n + < jr < 7n + 7n + <Y <^n+ 

tends to the limit 


1 p 

CIT’D 




as n 
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4. Three dice, white, red, and blue, arc tossed simultaneously n times. If k and 1 
are frequencies of 10 points on pairs: white, red; rcxl, blue; show that the probability 
of simultaneous inequalities 


n /Tl 

tends to the limit 


11 


144 




11 , n 




27r\/l‘20jt„ Jro 


as n —► 00 . 

5. Two players, A and B, take part in a game arranged as follows: Each time one 
ball is taken from an urn containing 8 white, 6 black, and 1 red ball; if this ball is 

white, A and B both gain $1; 
black, A loses $2, B loses $4; 
red, A gains $4, B gains $16. 

bet Sn and <r„ be the sums gained by A and B after n games. Show that the probability 
of simultaneous inequalities 


< 8n < toV^ 48n < <r« < 

for very large n will be approximately equal to 



-6((*fr*-VV<T)^dr. 


Note that the probability of the inequality s„ct„ < 0 is about 0.13—not very small— 
so that it is not very unlikely that the luck will be with one player and against another. 

6. Concentric circles Ci, Cz, (U, ... in unlimited numbers are described about 
the origin O, Points P\, I\, Pi, . . . are taken at random on these cin les. I..et R 
be the end point of the veertor representing the sum of vectors OPi, OP^, OP 3 , . . . . 
If ri, rj, ra, . . . are radii of Ci, C 2 , C 3 , . . . and the condition 


r\ -f r\ + - • + rj 
(r? + + • • • + r;)» 


as 


n —> 00 


is fulfilled, show that the probability that R will lie within the circle described with the 
radius p about the origin w ill be very nearly equal to 


for large n. 


__ _ 

1 _ e ri* ^ • -I rn* 
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CHAPTER XVI 


DISTRIBUTION OF CERTAIN FUNCTIONS OF NORMALLY 
DISTRIBUTED VARIABLES 


1. In modern statistics much emphasis is laid upon distributions of 
certain functions involving normally distributed variables. Such dis¬ 
tributions are considered as a basis for various **tests of significance^’ 
for small samples, that is, when the number of observed data is small. 
Some of the most important cases of this kind will be considered in this 
chapter. 

Problem 1. Independent variables Xi, 0 : 2 , .. . Xn are normally 
distributed about their common mean = 0 with the same standard 
deviation a. Find the distribution function of the sum of their squares 

fi = xf + xl 4- * * • + 4- 
Solution. The inequality 


a:? < t 

being equivalent to 

— < Xi < Vf, 

the distribution function of xj is 


1 rVi 1 rt 

Fi{t) = — 7 = I e ^*dx = —= I e Hu for f ^ 0 
<rv2irJ_Vi <rV2irJo 


Fiif) = 0 for 


t < 0 . 


Hence, the characteristic function of any one of the variables xj, x|, 
. . . x!is 






and that of their sum 


Consequently, the distribution function of s is expressed by 


Fit) = C + 
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and it remains to transform this integral. To this end, imagine a variable 
distributed over the interval (0, + <») with the density 


n 



Its characteristic function is 




V5)-(i - i) 


n 

2 


and since the distribution function is given a priori, we must have for 
i ^ 0 

(a\/2)- n 

r0) J" 

Hence 


e 2<r*|^2 — const. + 


(<ta/2)- 

2ir 


x: 


1 — e~' 


(X-») 


dv. 


iv\ 


F(t) = const. + 



u n 


1 




The constant must be = 0 since F{t) as well as the integral in the right 
member vanishes for ^ = 0. The final expression is therefore: 


F(t) =--T-T I « ' 


du 


for / ^ 0 


F[i) =0 for t ^0. 

The probability of the inequality 

x! + xl-h xl < t, 

on the other hand, can be expressed directly as a multiple integral 

Xi»4-X>»+ • ’ • -fXn* 

dxidx2 • • • dxn 
extended over the volume of the n-dimensional sphere S 


■ ■ ■ j' 


x\xl + ’ ' • + xl < L 

By equating both expressions of F{t)j we obtain an important transforma¬ 
tion, 
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( 1 ) 


// • J‘ 


2a* 


dxidxi 


dZn = 


■ I e 2a*|^2 , 

■# 


+ a;J) is an arbitrary function of 
u = I? + 4 + • ■ • + xj, 


If F{xl + xH- • 
the integral 

extended over the whole n-dimonsional space represents the mathematical 
expectation of F{u). On the other hand, the distribution function of 
u being known the same multiple integral will be equal to 




• • • -{-xi)dxidx2 • 


dxn 


-;r-v I ^ ‘^*F(u)u ^ du, 

(.V2)-rl)J” 


Taking in particular <r = 1, F{u) = we get the formula 

r r r • • * +x»*)-i-av'j^i*+^ 

jj j* 


( 2 ) 


dxidx-i 


dx„ = 




n~2 

2 dw. 


which will be used later, 

2. Problem 2. Variables Xi, 0 : 2 , . 
Denoting their arithmetic mean by 

xi X 2 -\- 


Xn are defined as in Prob. 1. 

' + Xn 


find the distribution function of the sum 

S = (xi — sy + (x2 — s)^ + • • • + (Xn — s)*. 
Solution. The probability of the inequality 

S < t 

is expressed by the multiple integral 
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2#« 


dxidx% 


dxn 


extended over the volume of the n-dimensional ellipsoid 

(xi - «)* -f (X2 -«)*+•• • + (Xn - s)* < L 


Let 

whence 

and 


X\ “ S — tiij 3/j 8 — S — tifii 

«1 + W2 + * * * + W„ = 0 

xf + 05} + • • • + xj = + w| + • • • + + ns*. 


Taking wi, Ut, • . . and s for new variables, we must first find the 
Jacobian J of Xi^ *2, . . . Xn with respect to Wi, ^2, . . . Wn-i, s. It is 


J = 


1 

1 

0 

0 • • 

0 


1 

1 

0 

0 • 

• 0 

1 

0 

1 

0 . . . 

0 


1 

0 

1 

0 • 

• 0 

1 

0 

0 

1 . . , 

0 

= 




• • 


1 

0 

0 

0 . . . 

1 


1 

0 

0 

0 • 

• 1 

1 

-1 

-1 

_1 . . . 

-1 


n 

0 

0 

0 • 

• 0 


= ( — 


In the new variables the expression for F{t) will be 

n«* 


F{t) 


■ J' 


2»* 


2»» 


dsduidu^ 


dUn-l 


and the domain of integration in the space of the new variables is defined 
by 

— 00 < s < 00 

n? + nj + ■ ’ ’ + + (ui + 1^2 + * • • + Wn-i)* < 

After performing the integration with respect to s, we get 
Vn C C C » • • -fw** 




2r> 


duidut 


dUn 


(t^) 

The quadratic form 
VO = u} + m| + 

can be represented as a sum of the squares of (n — 1) linear forms in 
variables ui, ut, . . . u,_i: 


+ «i-i + («J + w» + 




V = si + ei + 




The Jacobian 
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a(tii, vt, . . . v»-t) 

S(ui, Ml, . . . M._i) 

is the square root of the determinant of the form y>, which is the same 
as the determinant of linear forms 


2 “ 2Ul + Ml + • • • + Un-l 


1 

2 aun_i 


= tXi + U* + 


• * + 2Un~i. 


Now, in general 


p times 


XU ‘ I 
1 X 1 • •• 1 

111 • • • X 


= (X-l)p-KX + p-l) 


so that the determinant of v? is =n, whence 


a(t;i, » Vn-i) ^ ^ 

d(wi, wj, • * • Un-l) ^ 

and 

a(Ul, U2f • • • u^i) ^ 1 
a(t;i, vj, • * • Vn-i) y/n 

Therefore, taking Vi, Vj, . . . Vn-i for new variables, F{f) can be expressed 
as follows 


1 r r C 

r~ " ■ 

where the integral is extended over the volume of the sphere 
vl + vl + • - + vl_, < t. 

This multiple integral is exactly of the type considered in the preceding 
problem, and it can be reduced to a simple integral as follows 


// ■/< 




2 ^* dvidvt • • * = 

n-1 


- »• f 


'I_«_ n —3 

e 2»*ti 2 du. 
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After substitution, the final expression of F{t) is 


m = 




^ du for < > 0 


F{t) =0 for t ^ 0 . 


3 . Problem 3 . Variables xi, a;j, . . . Xn are defined as in Prob. 1 . 
As in Prob. 2, we set 

X1+X2 + • • • + Xn 

8 = - 

n 

Ui = Xi — a; 1 = 1, 2, ... n 
and introduce the quantity 


/ u» + m| + • • • + 
\ n 


What is the distribution function of the ratio 


or, which is the same, the probability F{t) of the inequality 

s < U*l 

Solution. First, assuming t to be positive, let us find the probability 
of the inequality 


u\ + ul+ • • • 

This probability can be presented in the form 






where the multiple integral 


- If P 


• • • +un* 

dUidU2 • • • dUn-l 


in which 


Un ~ (Wl + ^2 + • • • + Un-l) 
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is extended over the domain 


tij + + * * • -f- + (til + Ma + * • * + tin-l)* ^ 




Proceeding in exactly the same manner as in Prob. 2, we can transform 
^(s) into 


extended over the sphere 

t1+ v| + 




2<r> 


dvidv2 • * • dVn^l 




in the space of the variables Vi, Vj, . . . Vn-i. For this multiple integral 
we can substitute a simple integral 


n—1 n — 1 $ 




_!Ll* 


and thus reduce ^(s) to the form 

-i —1 n —2 a 

'J'(s) 


2^”I“- 2 ri _!!l* 




After substitution we can express as a repeated integral 




■y/ir(<ry/2) 

The derivative of is 


Jo 


0'(O = 


2 nH-' 


n r * 

(n — l\Jo 


V^(vV2)t(^) 






v?r(V) 


(!+«*)■* 
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whence 


Kl) r 


(1 + «*)» 


^(+®o) 


so that C = 1 and 




(1 + *•)» 


ii) 


4>(t) = 1-f 

vsr(!4iy- 

Such is the probability of the inequality 


•(1 + 


The probability F(f) of the inequality 


will be 1 — ^(0 or 


Ki) r 
m -I 


(1 + 2 *)" 


but this is established only for positive t However, this result holds 
for negative t as well. For t being negative = — t the inequality 


8 < —T€ 


is entirely equivalent to 


—« > TC 


and its probability is evidently 


Kl) r 

C-T) = 4>{r) - 1-I 


i^y- 
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But 



1 


which permits of writing the preceding expression for F{—t) as follows 



Thus, no matter whether t is positive or negative, the distribution func¬ 
tion of the ratio 


8 

€ 


or the probability of the inequality 


is given by 


B<U 


F(t) = 



+ z^) hz. 


The distribution of the quotient s/e was discovered by a British 
statistician who wrote under the pseudonym “Student,^* and it is com¬ 
monly referred to as ** Student's distribution." The first rigorous proof 
was published by R. A. Fisher. 

4 . Problem 4 . Variables x, y are in normal correlation. A sample of 
n corresponding pairs, Xi, yi] xi, j/i; . . . j/n is taken and the ‘‘correla¬ 
tion coefficient of the sample" is found by the formula 


= - s)iyi - s') 

" V^iXi - S)* • 2(y. - 


where, for the sake of abbreviation, 

_ *i + + • ■ • + *» ./ _ Vi + yi + • • • + 

g ss- 1 if — 
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Find the distribution function of p, that is, the probability P of the 
inequality p < P for a given P(—1 < P < 1). 

Solution. Since the expression of p is homogeneous of degree 0 in 
XiyXty . . . Xn; 2/1,2/2, • • • Vn we can assume <ri = (r2 = 1 . Also without 
loss of generality the expectations of x and y may be supposed =0. 
Denoting by r the correlation coefficient of x and 2/, the density of proba¬ 
bility in the two-dimensional distribution will be: 


g 2(1-r*) 


(**+!/>-2rxi/) 


2 ir(l - 

Hence the required probability will be expressed by the multiple integral 


P = --;; f f ■ ■ ■ r« • •• dpn 

(2»)"(1 - 

extended over the 2n-dimensional domain 

( 3 ) 2 (a:< - s)(yi — s’) < R-s/Z{Xi — «)* • — «')’ 

and 

(4) ip = Sx? -h S 2 /? - 2rSx<2/». 

Replacing a;,, yi{i = 1 , 2 , . , . n), respectively, by \/l — \/l “ r*2/», 

we can write P thus: 


^ ^ J* J ' ' ' ■ dy^ 

while ( 3 ) and ( 4 ) still hold but with the new notation for the variables. 
Let us set now 

Xi - 8 = Mi, ~ s' = Vi, 

then 

Ml -j- M2 + ’ * • + Mn = 0, 1^1 H" ^2 H" ■ * * ”h Vn = 0. 

Introducing s, s'; Ml, M2, . . . Mn-i ; Vi, t;2, . . . Vn-i as new variables, we 
find as in Sec. 2 




€ 2 dads'du I 


dun-idvi 


dvn 


where 


^ « ns* + ns'* — 2nrss' + Xuf + Xvf — 2r'Lu^i 
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and the domain of integration is defined by the inequalities 


— oo<S<oo; --oo<8'<oo 
7 ^u^Vi < Ry/ Sw? • Svf. 

Now by the same linear transformation the quadratic forms 
Xvf (each containing n — 1 independent variables) can be transformed 
into 




i-i 


at the same time 




Proceeding as in Sec. 2 and noting that 






e 


dsds' 


2 t 


n\/l 


we find that 

F - ■ 

where 


(2ir) 


5 ^JJ ■ ■ ■ J® 


dZn 


X = Xwf + Xzf — 2 rXWiZi 

and the domain of integration in the space of 2n — 2 dimensions is defined 
by 

I^WiZi < Ry/ • 2z?. 

We shall integrate now in regard to variables zi, Z2, . . . Zn-i for sl fixed 
system of values W\^ To this end we use an orthogonal 

transformation 

Z\ = Cl.lfl + Ci.2f2 + ’ ' * + Ci.n-lfn~l 

z% = 02,1^1 + ^ 2 , 2^2 + * * * + C 2 .n-ir»-l 


Zn-1 = + Cn-1.2f2 + * * * + Cn-l.n-lfn-l 

in which the elements of the first column are 

Wi 


Ci,l = 


V'ti'f + • • • + wi-i 


W 
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Defining {i, {n-i by 

Wl =s + Ci.jfi + • • • + 

W% — Cj,i{i + ‘ 


we shall have {i = u?, fi *= • • • = f«-i =* 0. By the properties of 
orthogonal transformations 

2a? * 2f?, 2a<u>< « 2f<{< = trfi 

so that for a fixed system of values w\^ w%, . , . Wn-i the domain of 
integration in the space of variables f i, f will be 

(6) fi < RVWl 

Thus we must first evaluate the integral 

J = // . . . /e-*(r.«+ • • • 

If f 1 < 0 no restriction is imposed upon ft, . . . f^-i; if fi > 0, then 

«+•••+ fL. > (^, - l)f!• 

Consequently the result of integration in regard to ft, . . . f».i can be 
presented thus: 

where the inner integral is extended over the domain 

fi + • • • + fL. < (^, - i)f? 

and c is a constant. Making use of formula (1), Sec. 1, the expression of 
J reduces to 




'■»-)* -S 


This has to be multiplied by 


- r*) 2 c • • • dwn^i 

and integrated over the whole space of the variables wi, Wt, • • • Wn^i. 
The resulting expression for P will be 
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P. f f... L-i- 

where 


Mdwi 


dwn-i 


Now we differentiate in regard to Rj reverse the order of integrations, 
and make use of formula (2), Sec. 1 ; the resulting value of dFjdR will 
then be expressed as a double integral 


_1 n-J n-4 

d^ ^ r 2(1 - r^) 2 (1 - _ 

■ 2-.r(”-i-^)r(” rJ) 


n — 1 \ Jo Jo 


or 


n — 1 n — 4 

dP - r^)~ (1 - R^) * 

M 


r/;«- 


since 


7 rr{n — 2) 


i «* 4-tt*) + Hrtu ^~^dtdu, 


In the double integral we make transformation to new variables f, rj 
defined by 



rj = tu. 


The Jacobian of f, u in regard to {, 17, being we have 


rjy 


-Hl‘+u')+Rrtu 


= ir(n - 1) 


I 


{tuY~^dtdu 

ir'di 




’ f 




^ = r(n 


- 7 ' 


dt 


{cht — Rr) 




and so, finally, 


dR 


n - 2 


(1 - r^) 2 (1 - R^) 


n 


dt 


0 {chi — RrY 


In case r = 0, that is, when the variables x, y are uncorrelated, we have 
a very simple expression of P: 



344 INTRODUCTION TO MATHEMATICAL PROBABILITY (Chap. XVI 


P = 


v^r(^y- 

In case r Q the integral 

p dt 
Jo {cht — Rr)^-^ 

can still be found in finite form. We have, in fact, 

f p. p = ^ + arc sin (JRr)], 

Jo cht -Rr Vl - -KV2[2 ' \ 

whence 

i {cht^rY-^ = 


and so 


P = - p*) * ~ ('•p)]j’<^p. 


where 


n — 1 

_ - r2) 2 

^ - 7r(n - 3)T “ ‘ 

When n is an even number, this integral appears in a very simple finite 
form, but in case of an odd n certain integrals of a rather complicated 
type appear. Besides, the behavior of P for somewhat large n cannot 
be easily grasped by using this integral expression for P. 

6. Fisher, who was first to discover the rigorous distribution of the 
correlation coefficient, called attention to the fact that, setting 

^ _2( x> - s) {yi -_s2_ 

\/S(a:i - sytivi - s'p’ 

the distribution of z will be nearly normal even for comparatively small 
values of n. Let us set thR = w, = r; then P can be expressed thus: 

p ^ ^-^2 f" r ” chzdtdz 

^ J-«oJo (chtchzch^ — sh^shzy~^ 

Instead of t it is convenient to introduce a new variable t so that 


chtchzcH ~ sh^shz = T~-^ch(z — t)- 
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Then 

P s ^ ^ f" fchz\i dz f V-i(l - T)"-^dT 

ir-\/2 J _ j [c/i(z - f)]“-*Jo Vl - pr 

where 

~h r) ^ c/i(fa) + r) 

^ 2chzch^ “ 2cho)ch^ 

for all values of 2 under consideration. Now 


and 


since 


X- 


*(1 ~ '\/irr'(n — 1) 

“T‘(n - i) 


VI - Pr 

r v-*(i - T)’‘-^<iT v^r(n - 1) /, p \ 
Jo Vl — P’’ ~ i) \ 2n — ly 

< fr-K'! - O-Hl + PT)dr 

Jo vl - pr Jo 
for 0 < p < 1 as can be easily verified. Consequently 


p = ^ 2)r(n — 1) T" /c/igy dz 

'\/^r(n-i) ~ f)]’'■’* ’ 

[ 14 - ^ 1 . 

[ 2cWc/ir2n - ij^ 


0 < 0 < 1 . 


As to the integral in this formula, its approximate expression, omitting 
terms of the higher order, is: 


j: 


e 


2-3 

^ dz — 


tH —(«-r)* 
2n - 3^ 


Thus for somewhat large n the required value of P can be found with 
the help of a simple approximate formula. 

The various distributions dealt with in this chapter are undoubtedly 
of great value when applied to variables which have normal or nearly 
normal distribution. Whether they are always used legitimately can 
be doubted. At least the “onus probandi’’ that the “populations” with 
which they deal are even approximately normal rests with the statisticians. 


Problems for Solution 


]. Show that 


J n+*yfl^nun 




6 2 ijZ 


^ f' 

V2irJ- • 


Htnt: Liapounoff’s theorem and Prob. 1, page 332. 
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2. With the same assumptions and notations as in Prob. 3, page 336, show that the 
distribution function of the quotient 



e 


F{i) 


_ 

F(t) =1 if f > Vn - 1; f(i) =0 if 


r /. 


^ — 4 


if 

t < 


M ^ Vn - 1 

\/ n — 1. 


It is worthy of notice that forn = 4 the distribution is uniform.' 

3. In two series of observations, samples xi, Xs, . . . Xn and yi, yt^ . . . yn> from 
the same normally distributed population (or of the same normally distributed vari¬ 
able) are obtained. Denoting for brevity 


xi -f iCi 4- • 


4 - Xn 


yi 4- Vi 4- • • • 4- yii- 




find the distribution function of the quotient - - — (“Student”). Ans, 
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APPENDIX I 


1. Euler’s Summation Formula* Let /(x) be a function with a 
continuous derivative /'(x) in an interval (a, h) where a and h > a are 
arbitrary real numbers. The notation 

n ^5 

n >a 

will be used to designate the sum extended over all integers n which are 
> a and ^ h. It is an important problem to devise means for the approxi¬ 
mate evaluation of the above sum when it contains a considerable number 
of terms. 

Let [x], as usual, denote the largest integer contained in a real number 
X, so that 

X = [x] + ^ 

where 6 , so-called “fractional part“ of x, satisfies the inequalities 


0 ^ 0 < 1 . 


Considered as functions of a continuous variable x, both [x] and 6 have 
discontinuities for integral values of x. The function 

p(x) = i“^ = [x]-x-hi 

is likewise discontinuous for integral values of x. Besides, it is a periodic 
function of x with the period 1; that is, we have 

p(x -f 1) = p(x) 

for any real x. With this notation adopted we have the following 
important formula: 

n 

(1) - p(a)/(a) - X 

n >a 

which is known as “Euler's summation formula." 

Proof. Let k be the least integer >a and I the greatest integer gb. 
The sum in the left member of (1) is, by definition, 

m -{-fik + D + • • • +/W 

and we must show that this is equal to the right member. To this end 
we write first 


847 
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i-i-i 

f*t>(x)f(x)dx = J'\(z)f'(x)dx + J[*p(ar)/'(x)dx + ^ j^^'*'^p(x)f'(x)dx. 
Next, since j is an integer, 

£^\{x)r{x)dx = - * + ^f'(x)dx = _M±M±i) + 

/•j+i 

+ J. mdx 

and 

^(^\{x)r{x)dx = JM.p^ - 2 /(n) + j‘f{x)dx. 

y-Jb ^ n — * + l 

On the other hand, 

p{x)f(x)dx = - 1 - I + 0/'(x)dx = - p(o)/(o) + 

+ Jy(x)dx 

J‘p(a:)/'(x)«ix = - X + i)/'(x)dx = + p(6)/(b) + J/(x)(ix, 

SO that finally 

fyx)f{x)dx = -/(fc) -/(*+1) -...-/(/) + 

+ p(.b)m - p(a)m 

whence 

n 

- J[‘p(x)/'(x)dx, 

n>a 

which completes the proof of Euler's formula. 

Corollary 1. The integral 

jr*p(x)d! = (t(x) 

represents a continuous and periodic function of x with the period 1. For 
<r(x + 1) - <t(x) = J^*'^’p(z)<fa =, /„‘p(*)«ix = J^'(i - Z)dz = 0. 

If 0 ^ X ^ 1, 
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and in general 


r(x) 




t{x) 


«(1 - ff ) 


where ^ is a fractional part of x. Hence, for every real x 

0 ^ a{x) ^ 

Supposing that/"(x) exists and is continuous in (a, b) and integrating by 
parts, we get 

f^pix)f'{x)dx = c(b)f'(b) - <r(a)/'(a) - J[V(x)/''(x)dx, 
which leads to another form of Euler’s formula: 

n ;Sb 

n >a 

+ <r(a)/'(a) + f'a(x)r(x)dx. 


Corollary 2 . If f{x) is defined for all a: ^ a and possesses a continuous 
derivative throughout the interval (a, -f-«); if, besides, the integral 

fj‘p{x)f'{x)dx 

exists, then for a variable limit b we have 

n 

(2) Xf{n) = c+fmdb + p{b)m + f"p{x)nx)dx 

n >o 

where C is a constant with respect to 6. 

It suffices to substitute for 

J^p(x)f'(x)dx 

the difference 

f"p(x)r(.x)dx - f’pix)nx)dx 

and separate the terms depending upon b from those involving a. 

2 . Stirling’s Formula. Factorials increase with extreme rapidity 
and their exact computation soon becomes practically impossible. The 
question then naturally arises of finding a convenient approximate 
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expression for large factorials, which question is answered by a celebrated 
formula usually known as ‘‘Stirling's formula,^* although, in the main, 
it was established by de Moivre in connection with problems on proba¬ 
bility. De Moivre did not establish the relation to the number 

T = 3.14159 . . . 

of the constant involved in his formula; it was done by Stirling. 

In formula (2) it suffices to take a = 3 ^, J{x) = log x, and replace 6 
by an arbitrary integer n to arrive at the remarkable expression 

log (1 ■ 2 • 3 • • • n) = C + log n - n + J* 

where C is a constant. For the sake of brevity we shall set 



„(n) = 

*/n ^ 

Now 


r-p(z)dx _ r+^p{x)dx 1 r+^(x)dx ^ 

and 


J**‘''‘p(x)dx _ J 

np(u)du _ C^p{u)du r^p{u)du _ 

^0 u k Jo u + k Ji w + A: ~ 

^ =J 

r*(i - U)du r(i - u)du _ 1 r* (1 - 2 uydu 
lo u + k Ji uk ^ 2 J 0 (A;-h w)(A; + 1 — 

Hence 

coin) = hf*il - 2 u)W^iu)du 

where 

2(Jb +M)(fc + 1 - w)' » 

ifc — n 

Since 



{k + u){k -f- 1 — w) = k{k + 1 ) + u — 

it follows that for 0 < u < 

{k + u)(,k -f- 1 — ti) > k(k -f 1) 

(A; -f- u){k + 1 — w) < (A? + J)® < (A; + i)(Af + |). 

Thus for 0 < w < 
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Fn{u) < 2 


k"»n 


k(k + 1) n 


Fniu) > 2 (fc + JKA: + I) - n + i' 

le->n 

Making use of these limits, we find that 

"(”> < - m 

"•”> ^ STTlX *' ~ ” 12(n + })' 

and consequently can set 

<t)(n) = 


where 

Accordingly 


12(71 -f- 0 ) 

0 <e <1 


log (1 . 2 • 3 • . • n) = C + (n + 01og n - n + 

The constant C depends in a remarkable way on the number t. 
To show this we start from the well-known expression for t due to Wallis: 


I =liin(f 


2 2 4 4 
3*3 5 


27^ 


2n 


2n — 1 2n -b 
which follows from the infinite product 


1 ) "■ 


sin X 


r 2 • 4.6 • 2n 1* 1 

1 [l.3-5 • • • (2n - 1)J 2n + l 


by taking x = ir/2. Since 

2 2 4 4 2?! 2n 

T * 3 * 3 ‘ 5 ‘ ' * 2n - 1 ‘ 271 -f 

we get from Wallis' formula 

/- r 2.4.6 • • 271 11 

(2n-l)v^J’ ” 

On the other hand, 

2-4*6- -27t = 2**l-2-3 

1.3 • 5 • • • (2n - 1) 


00 . 


1-2-3 


2n 


2*». 1.2*3 
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so that 


y/r = lim 


• 2.3 • • • n)* 
1.2.3 • • 2n 



or, taking logarithms 

log Vt = lim [2n log 2 + 2 log (1 • 2 • 3 • • • n) — 

- log (1 • 2 • 3 • 


But, neglecting infinitesimals. 


n—* CO 


• 2n) - i log n ] 


log (1 • 2 • 3 • • • n) = C + (n + i) log n — n 
log (1 • 2 • 3 • • • 2n) *= C + (2n -f- J) log 2n — 2n 

whence 


lim [2n log 2 + 2 log (l*2*3‘**n) — 

— log (1 • 2 • 3 • • • 2n) — i log n] = C — J log 2. 


Thus 


logy/r * C — J log 2, C = log \/2 t 

and finally 

(3) log (1 • 2 • 3 • • • n) = log \/2ir + + ^ log n ~ n + 


12(n + 9)' 


This is equivalent to two inequalities 


ei2n+6 < < ei2« 

v2irn n^e~* 


which show that for indefinitely increasing n 


lim 


1 2-3 » — n 
\/2jrn 


= 1 . 


This result is commonly known as Stirling’s formula. 
For a finite n we have 

1.2 • 3 • • • n = \/2irnn"e* • 

where 

12(n + }) ^ 12^’ 

The expression 
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is thus an approximate value of the factorial 1 • 2 • 3 • * • n for large n 
in the sense that the ratio of both is near to 1; that is, the relative error is 
small. On the contrary, the absolute error will be arbitrarily large for 
large n, but this is irrelevant when Stirling’s approximation is applied 
to quotients of factorials. 

In this connection it is useful to derive two further inequalities. 

Let m < n; we have, then, 

F«(u) - F,{u) = 2 (Jfc + u){k + 1 - u)’ 

k^m 

and further, supposing 0 < u < }4, 

* — n —1 

2E(itTU-S-5 

k"»m 

* — n— 1 

F^(u) - F,(u) > 2 (fe + jKfc +1) = “ ;rn’ 

k"‘fn 

Hence, 

- w(n) < «(»») - «(«) > i2(m + i) “ 12(n + }) 

and, if Hs a third arbitrary positive integer, 

«(m) +0,(1) - o,(n) < ^ +^ - ^ 

a(m) + 0,(1) - 0,(n) > I2(m + i) W+l) ~ 12(n + i)' 

3. Some Definite Integrals. The value of the important definite 
integral 

/o"*"*’* 

can be found in various ways. One of the simplest is the following: Let 

j. . /, V'l-di 

in general where n is an arbitrary integer ^0. Integrating by parts one 
can easily establish the recurrence relation 
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whence 


. 1.3 • 5 • • • (2m - 1) . 

Jtm = - ^ -t/o 

j _ 1 • 2 • 3 • • • m 

— -2-* 

On the other hand, 

Jn+i + 2XJ« + XVn-i ^ + X)*dt, 

which shows that 

Jn+} + 2XJ» + XVn-l > 0 

for all real X. Hence, the roots of the polynomial in the left member are 
imaginary, and this implies 

Jn ^ ^n+\J n—1. 

Taking n = 2m and n = 2m + 1 and using the preceding expression 
for Jtm and Jtm+i, we find 

2*4-6---2m 1 2-4-6---2m 1 

1 . 3.5 .. . (2m- ® 1 - 3.5 . . . (2m- 1) ViSi’ 

But 


hence 



2 4.6 

13.5.. 


. . 2m 1 
(2m - 1) -y/m 


= a/t; 


Jo * = iVir. 

Here substituting t = ^/aUf where a is a positive parameter, we get 

j, 

As a generalization of the last integral we may consider the following one: 
V = J^*^”*“* CO® hudu. 

The simplest way to find the value of this integral is to take the derivative 


£7 

db 



sin hu . vdu 


and transform the right member by partial integration. The result is 
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or 


whence 


_ly 

(ft " 2a 


(I(Ve«“) = 0, 
F = Ce **. 


To determine the constant C, take 6=0; then 

C . (F)„ - 


SO that finally 

The equivalent form of this integral is as follows: 


i: 


cos btidu = 






= J* e-«“‘+*'‘(iM = 
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METHOD OF MOMENTS AND ITS APPLICATIONS 

1. Introductory Remarks. To prove the fundamental limit theorem 
TshebyshefI devised an ingenious method, known as the method of 
moments/* which later was completed and simplified by one of the most 
prominent among Tshebysheff's disciples, the late Markoff. The 
simplicity and elegance inherent in this method of moments make it 
advisable to present in this Appendix a brief exposition of it. 

The distribution of a mass spread over a given interval (a, h) may be 
characterized by a never decreasing function ^(x), defined in (a, 6) 
and varying from ^(a) = 0 to ^(6) = tWo, where wio is the total mass con¬ 
tained in (a, h). Since ^(a;) is never decreasing, for any particular point 
Xo, both the limits 

lim <p(xo — c) = <p(zo — 0) 
lim <p(xo + f) = <p(xo + 0) 

exist when a positive number c tends to 0. Evidently 

- 0) ^ ip(Xo) g (p{xo + 0). 

If 

ip(xo - 0) = ip{xQ + 0) = ^(xo), 

then Xo is a “point of continuity** of ^(x). In case 

^0 + 0 ) > ip(xQ - 0 ), 

Xo is a point of discontinuity of ip(x)^ and the positive difference 
<p(xq + 0 ) - ip(xo - 0 ) 

may be considered as a mass concentrated at the point xo. In all cases 
ip{xo ~ 0) is the total mass on the segment (o, xo) excluding the end point 
Xo, whereas ^(xo + 0) is the mass spread over the same segment including 
the point xc. 

The points of discontinuity, if there are any, form an enumerable set, 
whence it follows that in any part of the interval (a, b) there are points of 
continuity. 

If for any sufficiently small positive c 

+ c) > ^(xo - €), 

Xo is called a “point of increase*' of ^(x). There is at least one point of 
increase and there might be infinitely many. For instance, if 

m 
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fp{x) — 0 for a ^ X ^ c 
<p[x) = rwo for c < X ^ bf 

then c is the only point of increase. On the other hand, for 

/ \ X — a 

every point of the interval (a, h) is a point of increase. In case of a 
finite number of points of increase the whole mass is concentrated in 
these points and the distribution function <p{x) is a step function with a 
finite number of steps. 

Stieltjes^ integrals 

J^xd<p(x) = mi, • • • J^x'd<p(x) = m< 

represent respectively the whole mass mo and its moments about the 
origin of the order 1, 2, . . . i. When the distribution function ^(x) 
is given, moments mo, mi, m 2 , . . . m, (provided they exist) are deter¬ 
mined. If, however, these moments are given-^and are known to originate 
in a certain distribution of a mass over (a, 6), the question may be raised 
with what error the mass spread over an interval (a, x) can be determined 
by these data? In other words, given mo, mi, m 2 , . . . m,, what are the 
precise upper and lower bounds of a mass spread over an interval (o, x) ? 
Such is the question raised by Tshebysheff in a short but imp)ortant article 
“Sur les valeurs limites des int^grales^^ (1874).^ The results contained 
in this article, including very remarkable inequalities which indeed are of 
fundamental importance, are given without proof. The first proof of 
these results and the complete solution of the question raised by Tsheby¬ 
sheff was given by Markoff in his eminent thesis “On some applications 
of algebraic continued fractions^' (St. Petersburg, 1884), written in 
Russian and therefore comparatively little known. 

Suppose that p, is the limit of the error with which we can evaluate the 
mass belonging to the interval (a, x) or, which is almost the same, the 
value of <p(x), when moments mo, mi, m 2 , . . . m< are given. If, with i 
tending to infinity, Pt tends to 0 for any given x, then the distribution 
function ^(x) will be completely determined by giving all the moments 

mo, mi, m2, ... . 

One case of this kind, that in which 

1 . 3 • 5 • . . (2fc - 1) 

mo = 1, riiik =- 2^ -^ ^**+1 = V 


‘ Jour. lAouvilU, Ser. 2, T. XIX, 1874. 
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was considered by Tshebysheff in a later paper, “Sur deux th^or^mes 
relatifs aux probabilit4s” (1887)^ devoted to the application of his 
method to the proof of the limit theorem under certain rather general 
conditions. The success of this proof is due to the fact that moments, 
as given above, uniquely determine the normal distribution 

ip{x) = -4= r 

of the mass 1 over the infinite interval (— «, +®o)- 

After these preliminary remarks and before proceeding to an orderly 
exposition of the method of moments, it is advisable to devote a few pages 
to continued fractions associated with power series, for continued frac¬ 
tions are the natural tools in questions of the kind we shall consider. 

2. Continued Fractions Associated with Power Series. Let 

0(*) - ^ ^ + • • • ; (Ai 0) 

be a power series arranged according to decreasing powers of z where the 
smallest exponent a\ is positive. We consider this power series from a 
purely formal point of view merely as a means to form a sequence of 
rational fractions 

Aif Ai -i- Aij Ai 4* -I. Ai, 

2^1 gai ' gai • gat gat 


and we need not be concerned about its convergence. 

Evidently 1 / 0 ( 2 ) can again be expanded into power series, arranged 
according to decreasing powers of z. Let its integral part, containing 
non-negative powers of 2 , be denoted by 91 ( 2 ), and let the fractional part 


Bi . Bt . 

2^1 g$t 


+ . . . 


containing negative powers of z, be denoted by — 01 ( 2 ), so that 


In the same way 


1 

0 ( 2 ) 


= gi(2) - 0i(2). 


1 

0l(2) 


can be represented thus: 

^ = 9.(*) - *.(*) 

^ Oeuvres computes de P. L. Tshebysheff, Tome 2, p. 482. 
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where q^iz) is a polynomial and 


<<>«(*) = §! + ^. + ^. + 


a power series containing only negative powers of z. Further, we shall 
have 


1 

<<>»(*) 


= 9»(*) - ♦«(*) 


with a certain polynomial q%(z) and a power series 


* (^\ __ I 1 1 


containing negative powers of 2 , and so on. Thus we are led to consider a 
continued fraction (finite or infinite) 


( 1 ) 



associated with ^(z) in the sense that the formal expansion of 



qi - 4>i{z) 


into a power series will reproduce exactly <>( 2 ). The continued fraction 
(1) is again considered from a purely formal standpoint as a mere abbre¬ 
viation of the sequence of its convergents 


Q\ q\^ Qi ~~Qz 5^1 ■“ ;r 

9 , 9 , - - 

The polynomials 

Pit P2t Pzf . . . 

Qit Qzt Qzf . • • 


can be found step by step by the recurrence relations 


Pi = qiPi-i - P<-2l 
Qi ~ q%Qi—i Qi—if 
Pi * 1, Po = 0 

Qi = qit Qo = 1 


= 2, 3, 4, 


( 2 ) 
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from which the following identical relation follows: 
(3) - Qiiz)Pi-i(z) = 1, 


showing that all fractions 

Pi(z) 

Qi(z) 

are irreducible. Evidently degrees of consecutive denominators of 
convergents form an increasing sequence and the degree of Qi(z) is at 
least i. Since 



we can write 


P i jQi+l — <^<4-1 (g)) — Pj-l 

Qi(Qi+\ ~~ ^i+l(z)) — Q*_l 


— ^Pi^i+ij z) 

Q»-fi Q«0i-fi(^) 


<t>{z) = 


P i-H P 1 j z) 

Qt-fi “ QiilK+i{z) 


in the sense that the formal development of the right-hand member is 
identical with (t>{z). By virtue of relation (3) 


<t>iz) - 


Qi 


1 _ 

QiiQi+\ — 


The degree of Qi being X, and that of Qi^i being X»+i, the expansion of 

QiiQi+l 


in a series of descending powers of z begins with the power 
Hence, 


4>(z) - 


Pi 

Qi 


M 


+ • • • 


and, since Xi+i ^ + 1, tlie expansion of 

♦w -1 

begins with a term of the order 2Xi -f 1 in 1/z at least. This property 
characterizes the convergents Pi/Qi completely. For let P/Q be a 
rational fraction whose denominator is of the nth degree and such that 
in the expansion of 

P 
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the lowest term is of the order 2n + 1 in l/z at least. Then P/Q coincides 
with one of the convergents to the continued fraction (1). Let i be 
determined by the condition 


X* ^ n <C X«*^i. 

Then 

^ 4- 

Qi 4- • • • 

P N 

^(*) - Q = + • • ' 

whence in the expansion of 

P _Pi 
Q Qi 

the lowest term will be of degree 2n + 1 or X* -f in 1 jz. Hence, the 
degree of 

PQi - PiQ 

in z is not greater than both the numbers 

X< — n ~ 1 and n — X^^-j 

which are both negative while 

PQi - PiQ 

is a polynomial. Hence, identically, 

PQi - PiQ =0 


or 


P Pi 
Q~^i 


which proves the statement. 

3. Continued Fraction Associated with 


ith r^. 

JaZ - X 


Let ip{x) be a never 


decreasing function characterizing the distribution of a mass over an 
interval (a, h). The moments of this distribution up to the moment of 
the order 2n are represented by integrals 

wio = J^^dip(x), 

mi = ^x*d<p(x), • ■ • mtn = 
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Let 

moniims 

wiimjms; • • • Ah = 
mstntmi 

If has not less than n + 1 points of increase, we must have 
Ao > 0, Ai > 0, . . . An > 0, 

and conversely, if these inequalities are satisfied, ^(x) has at least n + I 
points of increase. To prove this, consider the quadratic form 

0 (fo + fl-C + • » • + tnX^ydip{x) 

in n + 1 variables fo, <i, . . . in- Evidently 

<t> = (t, i = 0, 1, 2, . . . n) 

so that An is the determinant of <t> and Ao, Ai, . . . An-i its principal 
minors. The form ^ cannot vanish unless = • • • *= fn = 0. 

For if X == { is a point of increase and ^ = 0, we must have also 

+ hx + • • • + tnX^yd^(x) = 0 

for an arbitrary positive e, whence by the mean value theorem 

(<o + fii? + • • • + tnV^yJ^^^dipix) = 0({--c<iy<f + «) 

or 

fo + fl’? + • • • + = 0 

because 

J^^'dvix) > 0. 

Letting c converge to 0, we conclude 

fo + fi { + — • + fnf = 0 

at any point of increase. Since there arc at least n + 1 points of increase 
the equation 

to + flX + • • • + tnX'* = 0 

would have at least n + 1 roots and that necessitates 



momi • • • 

mn 

mimt • • • 

Wln+l 

winmn+i • 

• • W,n 


(o = f, = . . . = L = 0- 
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Hence, the quadratic form which is never negative, can vanish 
only if all its variables vanish; that is, 0 is a definite positive form. Its 
determinant An and all its principal minors An-i, An_j, . . . Ao must be 
positive, which proves the first statement. 

Suppose the conditions 

Ao > 0, Ai > 0, . . . An > 0 

satisfied and let fp(x) have s < n + 1 points of increase. Then the 
integral representing <f> reduces to a finite sum 

= Pl(k + tiii + • • • + fnti)* + PiiUi + fifj + • • • + -f* 

+ • • • + p.ik + flf. -f • • • + fnt:)* 

denoting by pi, pj, . . . p. masses concentrated in the a points of 
increase fi, { 2 , • . • f«. Now, since a ^ n constants ^o, <i, . . . fn, not 
all zero, can be determined by the system of equations 

to + fl{l + • * • + tnil = 0 

to + + • • • + fnf J = 0 


to + tii. + • • • + = 0. 

Thus 0 vanishes when not all variables vanish; hence, its determinant 
An = 0, contrary to hypothesis. 

From now on we shall assume that ^(x) has at least n + 1 points of 
increase. The integral 



can be expanded into a formal power series of 1/z, thus 
_ mo mi m2 , 1 1 

and this power series can be converted into a continued fraction as 
explained in Sec. 2. Let 

^ ^2 Pn Pn^l 

Ql Q2 ' ' Qn Qn+l 

be the first n + 1 convergents to that continued fraction. I say that the 
degrees of their denominators are, respectively, 1, 2, 3, . . . n + 1. 
Since these degrees form an increasing sequence, it suffices to show that 
there exists a convergent with the denominator of a given degree 

5 ^ n + 1. 

This convergent P/Q is completely determined by the condition that in a 
formal expansion of the difference 
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rMx) P 

Ja 2 X Q 

into a power series of I/ 2 , terms involving I/ 2 , I/ 2 *, . . . !/«*• are 
absent. This is the same as to say that in the expansion of 

<3(*) 

there are no terms involving 1/«, 1. . . l/z*. The preceding expres¬ 
sion can be written thus: 



Q(,x)d<p(x) 

Z — X 


Since 


+ - p{z) 

Jo Z - X 

Jo Z — X 



is a polynomiai in z, it must vanish identically. That gives 


(4) 


P(z) 


^ - Q(x) 


Z — X 


dip(x). 


To determine Q( 2 ) we must express the conditions that in the expansion of 


r Q(x)dv{x) 

Jo z - X 

terms in 1/z, I/ 2 *, . . . l/z* vanish. These conditions are equivalent to 
s relations 


(5) J^Q(,x)d^{x) = 0, jyQ{x)d^(x) = 0, • • • J^x'-^Q(x)d>pix) = 0, 
which in turn amount to the single requirement that 

( 6 ) fy(x)Q(x)dv>(x) = 0 

for an arbitrary polynomial 0(x) of degree ^ — 1. 

Conversely, if there exists a polynomial Q(z) of degree 8 satisfying con¬ 
ditions (5), and P{z) is determined by equation (4), then P{z)/Q(z) is a 
convergent whose denominator is of degree s. For then the expansion of 

m 

J« * - X Q{z) 

lacks the terms in 1/z, 1/z*, . . . 1/z**. 
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• Let 


Q{z) = io + + * * • + U-\Z‘ ^ + «•. 

Then equations (5) become 


“f" -4“ 771222 “f" * • * "f* 77l*_iZ,_i -f- 771, = 0 

TTliZo *4" 77l22i -4“ 77 I 8 Z 2 "1“ * • * “1“ "i” 1^$+l = 0 


Wl#_i2o + 771,2 i + ^t+lU 77l2,_22«_i -|- Tn2t-l = 0. 

This system of linear equations determines completely the coefficients 
lof 2i, . . . 2.-1 since its determinant A._i > 0. 

The existence of a convergent with the denominator of degree 

5 ^ 71 -f 1 

being established, it follows that the denominator of the sth convergent 
P./Q. is exactly of degree s. The denominator Q, is determined, except 
for a constant factor, and can be presented in the form: 


1 z 

. . . 2* 

TTlo r7l|77l2 

• • • 771, 

7711 77727713 

771,^1 

77l,_i 771,771,^1 • 

• • m2,-i. 


A remarkable result follows from ecpiation (0) by taking Q = Q. and 
B — Q^] namely, 

(7) =0 'f * ^ 

while 

J^Qld<p{x) >0 (s ^ n). 

In the general relation 

Q» = ([tQt—i Q»—2 

the polynomial g, must be of the first degree 

~ -j- 

which shows that the continued fraction associated with 
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has the form 


1 

Ci\Z + Pi — 


1 

otiZ + jSj — 


1 

atZ “h ^8 “ 


The next question is, how to determine the constants a« and Multi- 
plying both members of the equation 

Qa — (otfZ -f- Pi^Qg—i Q»—2 (s ^ 2) 

by Qf- 2 d^( 2 ), integrating between limits a and 6, and taking into account 
(7), we get 

0 = a. J*zQ.-iQ^td<p{z) — J^Qi.idq>iz). 

On the other hand, the highest terms in Q,_i and Q,-t are 

aia2 • • • otia2 • • • 

Hence, 

zQ»-2 --^ 


where ^ is a pol 3 rnomial of degree gs — 2. Referring to equation (6), 
we have 


X' 


zQ,-tQ,-id<p{z) 



and consequently 

( 8 ) 


£Qf-^9iz) 

fyLid<p{z) 


Suppose that the following moments are given: mo, mi, . . . m 2 n; how 
many of the coefficients a, can be found? Evidently ai = 1/mo. Fur¬ 
thermore, Qo = 1 and Qi is completely determined given mo and mi. 
Relation ( 8 ) determines a 2 , and Q 2 will be completely determined given 
wio, mi, m 2 , ms. The same relation again determines a^, and Q 3 will be 
determined given mo, mi, . . . ms. Proceeding in the same way, we 
conclude that, given mo, mi, m 2 , . . . m 2 n, all the polynomials 


Qo, Qi, Q2, . . . Qn 

as well as constants 


ctif as, as, . . . an+i 
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can be determined. It is important to note that all these constants are 
positive. 

Proceeding in a similar manner, the following expression can be found 

CzQLiMz) 

= - A - 

X Q!-id.p{z) 

It follows that constants 


^ 2 , . . . Pn 

are determined by our data, but not /Sn+i- For if s = n + 1, the integral 

f'zQid^iz) 

can be expressed as a linear function of mo, mi, . . . m 2 n+i with known 
coefficients. But m 2 n+i is not included among our data; hence, Pn+i 
cannot be determined. 

4. Properties of Pol 3 momials Q,. Theorem. Roots of the equation 
Q,{z) =0 ^ n) 

are real, simple, and contained within the interval (a, h). 

Proof. Let Q»(z) change its sign r < s times when z passes through 
points Zi, Z 2 , . , . Zr contained strictly within (a, h). Setting 

e{z) ^ (z - Zi)(z ~ Za) • • • (2 - Zr) 

the product 

e(z)Q,(z) 

does not change its sign when z increases from o to b. However, 

jy(,z)Q,{z)d<p(z) = 0 , 

and this necessitates that 

Giz)Q,(z) 

or Q,(2) vanishes in all points of increase of (p{z). But this is impossible, 
since by hypothesis there are at least n + 1 points of increase, whereas 
the degree s of Q* does not exceed n. Consequently, Q«(z) changes its 
sign in the interval (a, h) exactly s times and has all its roots real, simple, 
and located within (a, h). 

It follows from this theorem that the convergent 
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can be resolved into a sum of simple fractions as follows: 


P«(g) ^ Ai , At I . . . [ A 

Qn{z) '•’z-z, 


where Zi, zj, . . . z. are roots of the equation Q.(z) = 0 and in general 


At 


Pnizt) 


The right member of (9) can be expanded into power series of 1/z, the 
coefficient of l/z* being 


a“l 

By the property of convergents we must have the following equations: 

t» 

= m* 

a-l 

n 

2 ) AaZa = nil 
a— 1 


i4aZj-‘ = ni 2 ,_l. 


These equations can be condensed into one, 

ft 


( 10 ) 


XA,nza) = £nz)dv(z) 


which should hold for any polynomial T{z) of degree ^2n — 1. 
Let us take for T{z) a pol 3 aiomial of degree 2n — 2: 


[iz -&)«;(*-)] 


Then 

T{Za) = 1, T{Zfd =0 if 
and consequently, by virtue of equation (10), 

^b\ 


P ^ a 


Thus constants Ai, A 2 , . . . An are all positive, which shows that Pn{zh) 
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has the same sign as QI,(2*)- Now in the sequence 

qlm . ... o:w 

any two consecutive terms are of opposite signs. The same being true of 
the sequence 

Pn(2l), PnM, . . . Pn(z.), 

it follows that the roots of Pn(z) are all simple, real, and located in the 
intervals 


(2i, 2*); ( 22 , 2s); . . . (2n-i, 2 ,). 

Finally, we shall prove the following theorem: 

Theorem. For any real x 

q:(x)q^-^{x) - oLi(x)Qn(x) 

is a positive number. 

Proof. From the relations 

Q$(z) = (oi,z -f" 0$)Q$—i(^) “ Q«“ 2 ( 2 ) 

Q.W = (a^ + fis)Q 0 ^i(x) - Q»-t(x) 

it follows that 

Q.(z)Q.-i(x) - Q.(x } Q.-,(z) ^ a.Q.-i{z)Q..i{x) + 

2 — X 

Q.-i(z)Q.-,(i) - 0 ._i(x)Q.-»(2) 

' Z — X 

whence, taking s = 1, 2, 3, . . . n and adding results, 

Q.(g)Q^i(x) - Q.{x)Q.-iiz) ^ 2«.Q-.(®)Q.-.(*). 

It suffices now to take 2 = a: to arrive at the identity 

n 

Q:(x)Q._,(x) - QLiix)Qn{x) = X‘^.Q..,{xy. 

«-! 

Since Qo — 1 and a. > 0, it is evident that 

Qi(x)Qn_i(x) - QLl{x)Qnix) > 0 

for every real x. 

6. Equivalent Point Distributions. If the whole mass can be con¬ 
centrated in a finite number of points so as to produce the same I first 
moments as a given distribution, we have an “equivalent point distribu- 
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tion” in respect to the I first moments. In what follows we shall suppose 
that the whole mass is spread over an infinite interval — oo ^ oo and that 
the given moments, originating in a distribution with at least n + 1 
points of increase, are 

mo, mi, m*, . . . m**. 

The question is: Is it possible to find an equivalent point distribution 
where the whole mass is concentrated in n + I points? Let the unknown 
points be 

fl| {*!••• in+i 

and the masses concentrated in them 


A If Atf . . . An+l* 


Evidently the question will be answered in the affirmative if the system 
of 2n + 1 equations 

n+l 

Aa = mo 

a«l 

n+I 

2) 4afa = mi 


(A) 


n+l 

^ Aa{a = mt 

a —1 


n+l 

AaiS* = mjn 

a-1 

can be satisfied by real numbers fi, fo, . • . tn+il Ai, A 2 , . . . An+i, 
the last n + l numbers being positive. The number of unknowns being 
greater by one unit than the number of equations, we can introduce the 
additional requirement that one of the numbers { 1 , { 2 , . . • fn+i should 
be equal to a given real number v. The system (A) may be replaced by 
the single requirement that the equation 

n+l 

(11) XAan^) = /_\r(a5)d,,(x) 

a— 1 

shall hold for any polynomial T{x) of degree ^2n. Let Q{x) be the 
polynomial of degree n + l having roots fi, { 2 , . . . tn+i and let B{x) be 
an arbitrary polynomial of degree ^n — 1. Then we can apply equation 
(11) to 


nx) = B{x)Q{x). 
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'Since Q((a) = 0, we shall have 
(12) f"j(x)Q(x)dv>(x) = 0 

for an arbitrary polynomial d(x) of degree gn — 1. Presently we shall 
see that requirement (12) together with Q(v) = 0 determines Q(x), save 
for a constant factor if 

Qn(v) ^ 0 . 

Dividing Q(z) by Q«(x), we have identically 

Q(x) = (\x + M)Qn(x) + Rn-i(x) 

where Rn-i(x) is a polynomial of degree gn — 1. If 6 (x) is an arbi¬ 
trary polynomial of degree ^ n — 2, 

(\x + m)Hx) 

i 

will be of degree — 1. Hence 

£{\x + ii)6{x)Qn{x)dip{x). = 0 

by (6), and (12) shows that 

j\{x)Rn^i{x)dip{x) = 0 

for an arbitrary polynomial B(x) of degree g n — 2. The last require¬ 
ment shows that Rn-\(x) differs from Qn-i(x) by a constant factor. Since 
the highest coefficient in Q(x) is arbitrary, we can set 

Rn^i(x) = -Qn-i(x). 

In the equation 

Q(x) = (\X + M)Qn(x) ~ Qn-l(x) 

it remains to determine constants X and m- Multiplying both members by 
Qn^i(x)dip(x) and integrating between — oo and », we get 

\J*^xQn-iQndip(x) = f^^QLid<p(x) 

or 

-f ’QJdv’C*) = 

OlnJ— « J — • 


But 


«n-H 

Otn 
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whence 


X = an+i. 

The equation 

0 = Q(i;) = (ttn+lt' + M)Qn(v) — Qn-l(v) 
serves to determine n if Qn(v) 0. The final expression of Q(x) will be 

Q(x) = («»+,(x - V) - Q^^{x). 


Owing to recurrence relations 

Q% “* (cttx -f“ Pi)Qi Qo; Qi = (citx + Pi)Qi — Qi; • • * 

Qn ~ “I” ^n)Qn—-1 Qn—2| 

it is evident that 

Qt Qni Qn—If ... Qlf Qo “ 1 

in a Sturm series. For a; = — oo, it contains n + 1 variations and for 
X = 00 only permanences. It follows that the equation 

Q(x) = 0 


has exactly n + 1 distinct real roots and among them v. Thus, if the 
problem is solvable, the numbers {i, { 2 , . . . (n+i are determined as 
roots of 

Q(x) = 0. 


Furthermore, all unknowns Aa will be positive. In fact, from equation 
(11) it follows that 

“ J--[(* - ^ 

Now we must show that constants Aa can actually be determined so as 
to satisfy equations (A). To this end let 

Fix) = = [“.+»(* - «') + 

Then 

and, on account of (12), the expansion of the right member into power 
series of 1/x lacks the terms in 1/x, 1/a?*, . . . l/x”. Hence, the expan¬ 
sion of 

r-d^pjz) pjx) 

J^aX-z Q(x) 
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lacks the terms in 1/x, that is, 

P[x) _ Wlo , WU , , W2n , 

Q{x) X ^ -t- • • * • 

On the other hand, resolving in simple fractions, 

P{X) ^ , At ..... An^i 

Q(x) X - X - X - fn+l 

Expanding the right member into power series of \/x and comparing 
with the preceding expansion, we obtain the system (^4). By the previous 
remark all constants Aa are positive. Thus, there exists a point distribu¬ 
tion in which masses concentrated in n 1 points produce moments 
wio, mi, . . . m 2 n. One of these points v may be taken arbitrarily, with 
the condition 


Qn(v) 9^ 0 

being observed, however. 

6. Tshebysheff’s Inequalities. In a note referred to in the introduc¬ 
tion Tshebysheff made known certain inequalities of the utmost impor¬ 
tance for the theory we are concerned with. The first very ingenious 
proof of them was given by Markoff in 1884 and, by a remarkable 
coincidence, the same proof was rediscovered almost at the same time 
by Stieltjes. A few years later, Stieltjes found another totally different 
proof; and it is this second proof that we shall follow. 

Let ip{x) be a distribution function of a mass spread over the interval 
— 00 , 00 . Supposing that a moment of the order i, 

= mi, 

exists, we shall show first that 

lim Z*(mo — ^(0) = 0 
lim — = 0 

when I tends to -f oo. For 

^ l*J^’°dv>{x) = J'W+w) - ^(J)] 
or 

I‘(mo — ^(0) ^ j^“x'd<p{x). 

Similarly 

^ vf~ld^ix) = iv(-l) 
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or 


iV(-I) ^ |/_ lxy«,(*)|. 

Now both integrals 

*a;*d^(x) and ^jx*dip{x) 

converge to 0 as Z tends to + «>; whence both statements follow immedi¬ 
ately. Integrating by parts, we have 

P^x^dtpix) = l^[ip(l) — mo] — “ mo]x*~^dx 

= (-1)‘-‘ZV(-J) - if^x^-'vix)dx, 

whence, letting I converge to + oo, 

mi =* J^^x*d<p(x) = — mo]x*"^da; — ^x*^^ip(x)dx. 

If the same mass mo, with the same moment m*, is spread according to 
the law characterized by the function \Hx)y we shall have 

m< = “ i^oW^^dx — ^x*~hl/(x)dx, 

whence 

(13) = 0- 

Suppose the moments 


mo, mi, m2, . . . m2n 

of the distribution characterized by v>(x) are known. Provided ^(x) 
has at least n + 1 points of increase, there exists an equivalent point 
distribution, defined in Sec. 5 and characterized by the step function 
^(x) which can be defined as follows: 


^(x) = 0 

for 

— 00 < X < f 1 

^(x) = Ai 

for 

$1 ^ X < {j 

^(x) = Ai + A 2 

for 

t» ^ X < {. 

4>ix) = Ai+At + ■ • ■ + An 

for 

^ X < {,+1 

<Kx) = ^1 + At + • ■ • + An+1 

for 

f»+i ^ X < + w 


provided roots {i, { 2 , . . . fn+i of the equation Q(x) = 0 are arranged 
in an increasing order of magnitude. 
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Equation (13) will hold for t = 1, 2, 3, . . . 2n or, which is the 
same, the equation 

(14) X-*„^(^)l^(^) ~ H^)]dx = 0 

will hold for an arbitrary polynomial d(x) of degree ^2n ~ 1. The 
function 

h(x) = ip(x) - i/r(x) 

in general has ordinary discontinuities. We can prove now that h(x), if 
not identically equal to 0 at all points of continuity, changes its sign at 
least 2n times. ^ Suppose, on the contrary, that it changes sign r < 2n 
times; namely, at the points 


^2| • • • Of 

Taking 

d(x) = (x — ai){x - az) • • • (x - a,), 
equation (14) will be satisfied, while the integrand 

B(x)h{x), 

if not 0, will be of the same sign, for example, positive. Let { be any 
point of continuity of h{x). If J = a* (t = 1, 2, . . . r) then h(ai) - 0 
since h{x) changes sign at a*. If ( does not coincide with any one of the 
numbers ai, 02 , . . . Or, then for an arbitrarily small positive € we must 
have 


f^^*B(x)h(x)dx = 0 . 


But by continuity 

dix)h{x) 

remains in the interval (f — f + «) for sufficiently small e above a 
certain positive number unless h{0 = 0. Thus, if h{x) does not vanish 
at all points of continuity (in which case ip{x) and ^(:r) do not differ 
essentially), it must change sign at least 2n times. Let us see now where 
the change of sign can occur. In the intervals 

— 00, and i„+i, -f 00 

function f(x) is said to change sign once in (o, h) if in this interval there 
exists a point or points c such that, for instance, f{x) ^ 0 in (a, c) and f(x) ^ 0 in 
(c, 6), equality signs not holding throughout the respective intervals. The change 
of sign occurs n times if (a, b) can be divided in n intervals in which fix) changes 
sign once. 
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ip{x) — ^(x) evidently cannot change sign. 


Within each of the intervals 


Ji-l, ii 

there can be at most one change of sign, since ^(x) remains constant 
there, and fp{x) can only increase. The sign may change also at the 
points of discontinuity of ^(x); that is, at the points fi, £ 2 , . . . in+i. 
Altogether, <p{x) — ^(x) cannot change sign more than 2n + 1 times 
and not less than 2n times. 

Since ^(x) = 0 so far as x < {1 and v>(£i — «) is not negative for 
positive €, we must have 

v^(£i - €) - ^ 0. 

Also ^(x) = wio for X > {n+i and ^(x) ^ mo, so that 

^(£n+l + €) — ^(fn+l + «) so. 

At first let us suppose 

^(£1 — €) — ^(fi — e) > 0, <p(in+l + c) — ^(£n+l + e) <0. 

In this case ^(x) — ^(x) must change sign an odd number of times; that is, 
not less than 2n + 1 times. Since this cannot happen more than 2n + 1 
times, the number of times ^(x) — ^(x) changes its sign must be exactly 
2n + 1. These changes occur once within each interval 

£i-i, £i 

and in each of the points Ji, £ 2 , . . . £n+i. When the change of sign 
occurs in the interval (£i_i, £»•) where ^(x) remains constant, because ^(x) 
never decreases, we must have for sufficiently small € 

(15) ^(£i - ~ ~ €) > 0. 

But the sign changes in passing the point £»•; therefore, 

(16) <p{ii + €) - Hii + e)< 0. 

The equalities 

^(£1 — €) — ^(£1 — €) “ 0, ^(£n+l + €) — ^(£n+l + €) = 0 

cannot both hold for all sufficiently small €. For then there would not 
be a change of sign at £i and £„+i, so that the number of changes would 
not be greater than 2n — 1 which is impossible. Therefore, let 

<p(£i — «) — ^(£i — c) = 0 and ^(£n+i + «) — ^(£n+i + c) <0. 

Then there will be exactly 2n changes of sign: one in each of the intervals 

£i-i, £i 
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and in each of the points { 2 , £ 3 , . . . £n+i. The inequalities (15) and 
(16) would hold for t ^ 2, but 

<p((i — €) — ^(£1 — «) = 0, ^((1 + e) — ^(£1 + e) < 0 

for all sufficiently small c. 

Now let 


^(£n+l + «) — + «) = 0 8-Iid ^(£1 — €) — ^(£i — c) > 0 

for all sufficiently small positive €. Then there will be exactly 2n changes 
of sign: In each of the points £i, £j, . . . £n and in each of the n intervals 


t-i, £.-. 

The inequalities (15) and (16) will again hold for i ^ n, but 

^(£n+i — «) — ^(£n+i — €) > 0 and ^(£n+l + «) "” ^((n+i + c) = 0 

for all sufficiently small e. Letting € converge to 0, we shall have 

<p((i - 0 ) ^ m - 0 ) . 

<p((i + 0) ^ ^(£< + 0) 

for t = 1, 2, 3, . . . n + 1 in all cases. Then, since 

v^(£*) ^(£*) ^^((i+0), 

we shall have also 


<p(ii) ^ m - 0 ) 

(p((i) ^ ^(£i -f 0) 


or, taking into consideration the definition of the function ^(x) 


v((i) 


PM 

Q'(f.) 






These are the inequalities to which Tshebysheff^s name is justly 
attached. For a particular root £* = t; they can be written thus: 


ti<r 


EM 


v>(.v) ^ 2 


EM 

Q'M 


tis. 


(17) 
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with the evident meaning of the extent of summations. Another, lets 
explicit, form of the same inequalities is 

/lox «>(») ^ - 0) 

' ' w(») ^ ^(«> + 0). 

As to P(x) and Q(x), they can be taken in the form: 

P(x) = [a,+i(* - v)Qn{v) + Q_i(t>)]P,(x) - Q,(t>)P«_i(x) 

Q(x) = [a»n(x - »)Q,(») + Q,^i(»)]Q,(x) - Q»(»)Q«_i(x). 

Thus far we have assumed that v was different from any root of the 
equation 

<?.(*) = 0, 

but all the results hold, even if 

Q,(o) = 0. 

To prove this, we note first that when a variable v approaches a root ( of 
Q,(x), one root of Q{x) (either f i or {*4.1) tends to — « or +00 , while the 
remaining n roots approach the n roots Xi, Xi, . . . x* of the equation 

<3»(*) = 0. 

If $1 tends to negative infinity, it is easy to see that 

P(i.) 

tends to 0. In this case the other quotients 

P(ti) 

Q'Cfi) 

tend respectively to 

P«(a:i) P»(x,) 

q;(xo’ q:(x,)’ ■" • 

If 1 tends to positive infinity the quotients 


approach respectively 


P{ii). 
Q'(ei)' 



n 


whOe 


Pn(Xl) . 


; I = 1, 2, 3, 


P(£n+l) 

Q'(U0 
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tends to 0. Now take » = { - * and « = { + * in (17) and let the posi- 


whence again 


vii - 0) k 


2 P.(xt) 
0^ 

II <( 


V(( + 0 ) 



list 


v(() ^ 

*i<£ 

II 

But these inequalities follow directly from (17) by taking v = f 
Since 


Hv + 0) - i(v - 0) = 

Q’{v) 

it follows from inequalities (18) that 

0 g p{v) - ^(» - 0) ^ 

On the other hand, one easily finds that 

Pjv) _ _ 1 

O'W a»+x(2„(e)» + Q'MQ^iv) - QLiWQnft.)' 
But referring to the end of Sec. 4, 


Q'MQ„-i(,v) - Qi_,(w)Q„(») = 2) «.Q—ifw)*, 

« = 1 


whence 

a„+.Q,(a)» + QMQn-i(v) - Q’^MQniv) = Q'+,(«)Q.(») - 0'(i.)Q„+,(»). 
finally, 


0 ^ v>(v) — ypiv ~ 0) ^ 


1 

- qmq'::^(vT 


If <pi(v) is another distribution function with the same moments 


Wo, Wi, W2, . . . Win, 
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we shall have also 


and as a consequence, 

(19) IfPiCv) - ip{v)\ ^ Xn{v) 

—a very important inequality. Here for brevity we use the notation 

QWMQniv) - Q'MQn+M 

7. Application to Normal Distribution. An important particular 
case is that of a normal distribution characterized by the function 


1 

<p{x) = —= I e~^*du, 
VttJ- « 


In this case it is easy to give an explicit expression of the polynomials 
Qn{x). Let 


H«(x) = c 



Integrating by parts, one can prove that for Z ^ n — 1 

e~^*x^Hn(x)dx = 0 . 

Hence, one may conclude that Qn(x) differs from Hn(x) by a constant 
factor. Let 

Qn{x) = CnHn(x). 

To determine Cn, we may use the relation 

Hn(x) = -2x//n-l(x) ~ 2(n - l)Hn- 2 (x) 

which can readily be established. Introducing polynomials Q„, this 
relation becomes 


Hence, 


Q.(x) = -2x^Q.-^{x) - 2(n - 

Cn-l Cn-2 


Cn _ 1 _ cj 

Cn -2 ~ 2n — 2’ “ Cn-i’ 


Since Ha(x) = Qa(x) = 1, we have Co = 1; also 


ai 



Pn = 0 . 
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whence Ci = — The knowledge of co and Ci together with the relation 


c = 

" 2n - 2 

allows determination of all members of the sequence C2, C3, C4, . . . . 
The final expressions are as follows; 

1 

Csm 2"* • 1 • 3 • 5 • • • (2 to - 1) 

-1 

C2m+1 2 '»+i “2 • 4 • 6 • • • 2 w’ 

From the above relation between Hn{x), Hn-iix) and owing to 

the fact that Unix) is an even or odd polynomial, according as n is even or 
odd, one finds 

H2m(0) = (-2)~ • 1 • 3 • 5 • • • (2m - 1), 
while another relation 

Hlix) = -2n/7n-i(x), 

following from the definition of IIn{x), gives 

// 2 m-i( 0 ) = (-2)-* 1*3-5 • • • (2m- 1). 

These preliminaries being estal)lished, we shall prove now that 

attains its maximum for t; =0. Let 


m = - H'MHn+,{v). 

Then, taking into account the differential equation for poI}niomials 

HnW: 

= 2virM - 2nH„{v) 

we find that 


dU 

dv 


= 2vQ - 2H„(v)H„+i{v). 


On the other hand, 


n = -Ii„+xivr 


d Hniv) 


dv 

and denoting roots of the polynomial H„+i(y) in general by $, 
d H„iv) 'SJ /7n(« 1 


dv f/„+i(e) 


- ^ 


l(«) iv - 
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Consequently 


Again 


and so 






_1_ 

(v - {)*■ 




dv 




(i) («- {)* 


j_ _ 


n + 1 




Roots of the polynomial Hn+i(x) being symmetrically located with 
respect to 0, we have: 




and finally 


!(» + ?)» 

dv n + I ^ (v^ — 


(^2 _ J 2 ) 2 ^ 


(j;2 _ J2)2 


Hence 


^>0 if t;<0; ^ ^ ^ v>0 


that is, il{v) attains its maximum for v = 0 and Xn(v) attains its maximum 
for t; = 0. Referring to the above expressions of C 2 m, C 2 m+i; ^f2m(0), 
H 2 m+i(fi)f we find that 


_ 2 • 4 • 6 • • • 2m 

X2-.W 3.5.7 . . . (2to + 1) 

_ 2 * 4 • 6 • • • 2m 

X 2 m+iW 3 • 5 • 7 • • • (2m + 1)* 

In Appendix I, page 354, we find the inequality 

2 • 4 • 6 • • • 2m 1 ^ \/ir 

1 • 3 * 5 • • • (2m 1) ^rn + 2 2 

whence 

2•4•6 • • • 2m I T 

3 • 5 • 7 • • • (2m + 1) \4m + 2* 

Xn(v) ^ X.(0) < yl£> 


Thus, in all cases 
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whence, by virtue of inequality (19), 

ki(») - IpWI < 

Thus any distribution function <pi(v) with the moments 

. . 1 • 3 • 5 • • • (2fc - 1) ^ , 

mo = 1, m 2 k-i = 0, m 2 k =-- (« ^ n) 


corresponding to 


<p(v) 

differs from ^(v) by less than 



e~^*du 


Since this quantity tends to 0 when n increases indefinitely, we have the 
following theorem proved for the first time by Tshebysheff: 

The system of infinitely many equations 



^ 00 

J x^~^d<p(x) = 0 ; 


fe = 1, 2, 3, . 


J = 

_ 1 • 3 • 5 • • • (2Jk - 1) 
2 ‘ 


uniquely determines a never decreasing function <p{x) such that ^( — <») = 0; 
namely^ 

1 r* 

ip{x) = —pr I er^'du. 

8. Tshebysheff-Markoff’s Fundamental Theorem. When a mass = 1 
is distributed according to the law characterized by a function F(x, X) 
depending upon a parameter X, we say that the distribution is variable. 
Notv/ithstanding the variability of distribution, it may happen that its 
moments remain constant. If they are equal to moments of normal 
distribution with density 


then by the preceding theorem we have rigorously 


F{x, X) = e-'du 

V»J- » 


no matter what X is. 
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Generally moments of a variable distribution are themselves variable. 
Suppose that each one of them, when X tends to a certain limit (for 
instance <»), tends to the corresponding moment of normal distribution. 
One can foresee that under such circumstances F(x, X) will tend to 

1 r* 

ip{x) = I e~^'du. 
y/rj- « 

In fact, the following fundamental theorem holds: 

Fundamental Theorem. //, /or a variable distribution characterized 
by the function F(x, X), 

limj* x^dF(Xj X) = e“**x*dx; \—*co 

for any fixed = 0, 1, 2, 3, . . . , then 

lim F(v^ X) = r X qo 

V^J- - 

uniformly in v. 

Proof. Let 


mo, mi, m2, . . . m2n 

be 2n + 1 moments corresponding to a normal distribution. They 
allow formation of the polynomials 

Qo(x), Qi(x), . . . Qn(x) and Q{x) 

and the function designated in Sec. 6 by \^(x). Similar entities cor¬ 
responding to the variable distribution will be specified by an asterisk. 
Since 


m* —> mjfc as X —> » 

and since A„ > 0, we shall have 

a;>o 

for sufficiently large X. Then F(x, X) will have not less than n + 1 
points of increase and the whole theory can be applied to variable dis¬ 
tribution. In particular, we shall have 

0 ^ ipiv) - f (» — 0) g xn{v) 

( 20 ) 

0 ^ F(Vy X) - ^♦(» - 0) ^ xt(v). 

Now Q*ix)(8 = 0, 1, 2, . . . n) and Q*{x) depend rationally upon 
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mHk= Of If 2, . » . 2n); hence, without any difficulty one can see that 

Q*{x) -* Q.{x); s = 0, 1, 2, . . . n 
Q*{x)-*Q(x) 

as X —> 00 ; whence, 

x;(») Xn(v). 


Again 

^♦(t; - 0) -> ^(v - 0) 

as X —> 00 . A few explanations are necessary to prove this. At first let 
Qniv) 9^ 0. Then the polynomial Q{x) will have n + 1 roots 

{i < f2 < Js < * ‘ ‘ < fn+l. 

Since the roots of an algebraic equation vary continuously with its 
coefficients, it is evident that for sufficiently large X the equation 

Q*(x) = 0 

will have n + 1 roots: 


ft < fl < ft < • • • < 

and will tend to as X — > oo . In this case, it is evident that — 0) 
will tend to ~ 0). If Qn{v) = 0, it may happen that or tends 
respectively to — oo or+oo asX-->oo, while the other roots tend to the 
roots 


of the equation 


Xif X2f . • • Xn 


Qn(x) = 0. 

But the terms in — 0) corresponding to infinitely increasing roots 
tend to 0, and again 


^♦(t; - 0) -> 4'iv - 0). 


Now 


< yjk' 

Consequently, given an arbitrary positive number €, we can select n so 
large as to have 
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Having selected n in this manner, we shall keep it fixed. Then by the 
preceding remarks a number L can be found so that 

x-W < ^ <. 

|^(f; - 0) - ^♦(t; - 0)1 < € 

for X > L. Combining this with inequalities (20), we find 
\F{v, X) - ^(t;)l < 36 

for X > L. And this proves the convergence of F(t;, X) to (p{v) for a 
fixed arbitrary v. To show that the equation 

1 f 

lim F{Vy X) == J 

holds uniformly for a variable v we can follow a very simple reasoning due 
to P6lya. Since (p(—<») =0, ^(+0°) =1 and ip(x) is an increasing 


function, one can determine two numbers Oo and On 

so that 

»>(*) ^ <p(ao) < g 

for 

X ^ Oo 

1 - g 1 - #>(o,) < ^ 

for 

X ^ On. 


Next, because <p(x) is a continuous function, the interval (oo, On) can be 
subdivided into partial intervals by inserting between Oo and On points 
oi < 02 < • * • < o„_i so that 

0 < <p(,ak+i) — fp{ak) < I 

for fc = 0, 1, 2, . . . n — 1. By the preceding result, for all sufficiently 
large X 

F(ao, X)< 1 - F{an, X)< i 

and 


|F(o*, X) - v>(ot)l < fc = 1, 2, . . . n - 1. 
Now consider the interval (— «, Oo). Here for v g Oo 
0 g F(v, X) < 0 < v(.v) < g 

and 

1F(», X) - v(t>)l < *. 
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For V belonging to the interval (on, + «>) 

0 ^ 1 - F(t>, X)< 0 < 1 - ,p[v) < 


whence again 


lF(t;. X) - ip(v)\ < €. 


Finally, let 

ak^v< ajt+i (k = 0, 1, 2, . 


Then 


n - 1). 


F{v, X) — ip{v) ^ F(afc, X) — = 

= ~ ^(flfc)] + lv?(afc) — ^(Ok+l)! 

F(t;, X) - <p(v) ^ F(ajk+i, X) - v>(aik) = 

= [F(aib+i, X) — (p{ak+i)] + [v>(a*+i) — ^(a*)]. 

But 


whence 


F(afc, X) — ip{ak) > ~ 2* 
F(aifc+i, X) — <p{ak+i) < 2* 

-€ < F(Vy X) 


ip{ak) — ^(cuk+i) > -”2 

- ^(v) < €. 


Thus, given €, there exists a number L{t) depending upon e alone and 
such that 

|F(t;, X) - < € 

for X > L(€) no matter what value is attributed to v. 

The fundamental theorem with reference to probability can be stated 
as follows: 

Let Sn he a stochastic variable depending upon a variable positive integer 
n. If the mathemaiical expectation E{si) for any fixed = 1, 2, 3, . . . 
tendSf as n increases indefinitely^ to the corresponding expectation 

E{x^) = — 4 = r x^e~**dx 
VirJ- - 

of a normally distributed variable, then the probability of the inequality 


tends to the limit 


Sn <v 



e-*'dx 


and that uniformly in v. 



388 


INTRODUCTION TO MATHEMATICAL PROBABILITY 


In very many cases it is much easier to make sure that the conditions 
of this theorem are fulfilled and then, in one stroke, to pass to the limit 
theorem for probability, than to attack the problem directly. 

Application to Sums of Independent Variables 

9. Let 2 i, 22 , 28, . . . be independent variables whose number can be 
increased indefinitely. Without losing anything in generality, we may 
suppose from the beginning 

E{zk) =0; fc = 1, 2, 3,- 

We assume the existence of 


E{zl) = h, 

for all A; = 1, 2, 3, ... . Also, we assume for some positive b the 
existence of absolute moments 


= = 1,2,3,- 

Liapounoff’s theorem, with which we dealt at length in Chap. XIV, 
states that the probability of the inequality 


where 


2l + 22 4- ’ * * + ^ ^ 

+ &2 + • • * + &n 


tends uniformly to the limit 

y/vj- • 

COj provided 

^(2+8) 4, + ♦ ♦ . + ^ ^ 

Liapounoff’s result in regard to generality of conditions surpassed by 
far what had been established before by Tshebysheff and Markoff, whose 
proofs were based on the fundamental result derived in the preceding sec¬ 
tion. Since Liapounoff^s conditions do not require the existence of 
moments in an infinite number, it seemed that the method of moments 
was not powerful enough to establish the limit theorem in such a general 
form. Nevertheless, by resorting to an ingenious artifice, of which we 
made use in Chap. X, Sec. 8, Markoff finally succeeded in proving the 
limit theorem by the method of moments to the same degree of generality 
as did Liapounoff. 
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Markoff’s artifice consists in associating with the variable Zk two new 
variables Xh and yu defined as follows: 

Let AT be a positive number which in the course of proof will be 
selected so as to tend to infinity together with n. Then 

Xfc = 2fc, 1/fc = 0 if l2fc| ^ N 

Xk = 0, j/fc - Zk if \zk\ > N, 

Evidently z*, x*, yu are connected by the relation 

Zk = Xk + yk 

whence 

( 21 ) E(xk) + E{yk) = 0 . 

Moreover 

EixD + E{yi) = E(zi) = hk 

( 22 ) 

E\xk\^^* + E\yk\^^ = E\zk\^^ = 

as one can see immediately from the definition of x* and y*. 

Since x* is bounded, mathematical expectations 

Eixl) 

exist for all integer exponents Z = 1, 2, 3, . . . and for fc = 1, 2, 3. 

In the following we shall use the notations 

\E{xi)\ = 4 ^ 1 - 1 , 2.3 - 

+ c<« + • • • + - a: 

Not to obscure the essential steps of the reasoning we shall first 
establish a few preliminary results. 

Lemma 1. Let qk represent the probability that yk ^ 0; then 

On 

Ql + g2 + • ^ + Qn ^ 

Proof. Let ^*(x) be the distribution function of z*. Since y* 5 ^ 0 
only if \zk\ > N, the probability qk is not greater than 

f~Wix) + 

On the other hand, 

f'^\x\^+>dMx) + f‘\x\^+‘dMT) ^ Mr*’- 
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But 

whence 

9t ^ J ^ <iMx) + dMx) ^ 

The inequality to be proved follows immediately. 
Lemma 2. The following inequality holds: 


Proof. From 




c» 


E\yu\*^* g 4«-‘> 


which is a consequence of the second equation (22) it follows that 


£■(!/?) ^ 




The first equation (22) 


cr + E{yl) = hu 


gives 


hk g ci« S 6t - 


JV' ■ 


Taking the sum for fc 


whence 


1, 2, 3, . . . n, we get 
Bn ^ S: ^ - % 




c» 

BJi* 


Lemma 3. For e g 3, 



Proof. This inequality follows immediately from the evident 
inequalities 


ci*’ g B|x*l‘ g N^^Eixl) ^ N-*h. 
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Lemma 4. The following inequality holds 

4“ + + • ■ • + c<,» ^ / C. 

- Bjl = • 

Proof. Since 

E{xk) + E(yk) = 0 , 

we have 

= \Eixk)\ = \E{yk)\ ^ E\yk\. 

On the other hand, by virtue of Schwarz^s inequality 

[EM + EM -f • • • + EM? ^ 

n 

^ {qi + Qi ‘ + qn)^^E(yl) ^ 

*-1 

whence the statement follows immediately. 

If the variable integer N should be subject to the requirements that 
both the ratios 

and 

should tend to 0 when n increases indefinitely, then the preceding lemmas 
would give three important corollaries. But before stating these 
corollaries we must ascertain the possibility of selecting N as required. 
It suffices to take 

N = (BnCn)^«. 

Then 



by virtue of Liapounoff’s condition. 
Also 



will tend to 0. By selecting N in this manner we can state the following 
corollaries: 

Corollary 1. The sum 

5^1 + + * ' * + 

tends to 0 as V oo. 
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Corollary 2. The ratio 

Bn 

lends to 1. 

Corollary 3. The ratio 

df + • + cy 

b | 

tends to 0 for alt positive integer exponents e except e = 2. 

10. Let FJj) and represent, respectively, the probabilities of the 
inequalities 

+ ^2 “f" * * * + ^ , 

*i + ^ , 

vm ^ 


By repeating the reasoning developed in Chap. X, Sec. 8, we find that 

\Fn(t) — ^n(01 ^ + 5* + * * * + ?»»• 

Hence, 


lim (Fn(t) — <l>n(t)) =0 as n » 

by Corollary 1. It suffices therefore to show 


1 C* 

4»n{t) — ► — 7 = I e~**dx as n~> «, 

V^J- - 

and that can be done by the method of moments. By the polynomial 
theorem 


fxi + X2 + 




-f Xn \”* ^ ml 

) ^a\p\ • • 


X! 


Sa.fi, ... X 
m m 
22Bn2 


where the summation extends over all systems of positive integers 
a ^ /3 ^ ^ X satisfying the condition 

\ = m 

and Sa,fi, ... X denotes a symmetrical function of letters Xiy Xiy . . . Xn 
determined by one of its terms 

xfixi . . . x} 
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if Z represents the number of integers a, . . . X. Since variables 
Xi, X 2 f . . . Xn are independent, we have 

/xi + 0:2 + • * * 4“ ^n\”* _ m\ Ga, 0 , ... X 

\ ) 2 icc\p\ • ■ • X! 

where Gas, ... x is obtained by replacing powers of variables by mathe¬ 
matical expectations of these powers. It is almost evident that 

.. .xl ^ + 4“> + • • • c:.“> cf + cf + . . . 4 . 

m = a ‘ i 

Bn^ Bn^ BJ 

4 - • * * 4 - 

— X * 

bJ 

Now if not all the exponents a, /3, ... X are = 2 (which is possible 
only when m is even), by virtue of Corollary 3 the right member as well as 

Ga.a. • — X 

m 

B,* 

tends to 0. Hence 


i 


Xl + X 2 + 


V^n 




if m is odd. 

But for even m we have 


( 23 ) 


\ V2b; ) 


VWn 

Let us consider now {m being even) 


7^ G 2 . 2 , .. . 2 Q 

Bj 


( b:y _ /ci« 4- 4- • • • + _ x:i 2’ 

\Br.) Bn ) ^XV! • • • 


//x.M. . . 




Bn^ 


where summation extends over all systems of positive integers 
X ^ M ^ ^ w 


satisfying the condition 


X4-m4-* •4-co=-2 
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and ... « is a symmetric function of determined by 

its term 

. . . (ci»)“, 

I being the number of subscripts n, . . . u. Apparently 

Hk.. ^ + (ci*>)^ + • • • + (ci»)" . . . 

= Bi 

Bn* 

W”)" + (C^*>)“ -f • • • + (ci»)" 


Besides 

and 


Bt 

ci» g N^, (4»)* ^ ^ 

(4»)« + (4”)* + • • • T (4»)‘ ^ 


Bl 


“ \B«/ 


if e > 1. Thus 


... o 

m 

Bn* 


if not all subscripts \ n, ... a are equal to 1. It follows that 


Bn* 


But by Corollary 2 


B, 


and evidently Hi,j, . . . i = Gt,t, . . . t. Hence 

' Bn* 

and this in connection with (23) shows that for an even m 

j,/Zl + ij + • • • + XnX” ^ m\ 

\ ; 2"(f)i 

Finally, no matter whether the exponent m is odd or even, we have 
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Tshebyshcff-Markoff’s fundamental theorem can be applied directly 
and leads to the result: 


lim <t>n{t) - r e-^^dx 

VttJ- « 

uniformly in t. On the other hand, as has been established before, 

lim [F„(0 — 4 >n{t)] = 0 

uniformly in t. Hence, finally 

lim Fn{t) = r e~**dx 

V^J- * 

uniformly in t. 

And this is the fundamental limit theorem with Liapounoff’s condi¬ 
tions now proved by the method of moments. This proof, due to 
Markoff, is simple enough and of high elegance. However, preliminary 
considerations which underlie the proof of the fundamental theorem, 
though simple and elegant also, are rather long! Nevertheless, we must 
bear in mind that they are not only useful in connection with the theory 
of probability, but they have great importance in other fields of analysis. 
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ON A GAUSSIAN PROBLEM 


1. In a letter to Laplace dated January 30, 1812,^ Gauss mentions a 
difficult problem in probability for which he could not find a perfectly 
satisfactory solution. We quote from his letter: 

Je me rappelle pourtant d’un probltoe curieux duquel je me suis occupy il y 
a 12 ans, mais lequel je n’ai pas r4ussi alors k r4soudre ^ ma satisfaction. Peut- 
4tre daignerez-vous en occuper quelques moments: dans ce cas je suis sur que vous 
trouverez une solution plus complete. La voici: Soit M une quantity inconnue 
entre les limites 0 et 1 pour laquelle toutes les valeurs sont ou 4galement probables 
ou plus ou moins selon une loi donn4e: qu*on la suppose convertie en une fraction 
continue 



-f • 


Quelle est la probabilit4 qu’en s’arr4tant dans le d4veloppement k un terme fini 
a^n) la fraction suivante 


1 

a^n+l) -j_ 


1 

a<n+2) _|- . 


soit entre les limites 0 et x? Je la designe par F(n, x) et j’ai en supposant toutes 
les valeurs 4galement probables 

P(0, x) = X. 


P(l, x) est une function transcendante d4pendant de la function 
^ + 1 +• • • +5 


que Euler nomme in4xplicable et sur laquelle je viens de donner plusieurs re- 
cherches dans un m4moire pr4sent4 4 notre Soci4t4 des Sciences qui sera bientdt 
imprim4. Mais pour le cas ou n est plus grand, la valeur exacte de P(n, x) semble 
intraitable. Cependant j'ai trouv4 par des raisonnements tr4s simples que pour 
n infinie 


P(n, x) = 


log (1 4- x) 
log 2 


‘Gauss* Werke, X, 1, p. 371. 


396 



APPENDIX III 


397 


Mais les efforts que j’ai fait lors de mes recherches pour assigner 


P(n, x) - 


log (1 -h x) 
log 2 


pour une valeur tr^s grande de n, mais pas infinie, ont 4t4 infructueux. 

The problem itself and the main difficulty in its solution are clearly 
indicated in this passage. The problem is difficult indeed, and no 
satisfactory solution was offered before 1928, when Professor R. O. 
Kuzmin succeeded in solving it in a very remarkable and elegant way. 

2. Analytical Expression for Pn{x). We shall use the notation 
Pnix) for the probability which Gauss designated by P(n, x). The first 
question that presents itself is how to express Pn{x) in a proper analytical 
form. Let 5(t;i, i; 2 , . . . Vn, x) be an interval whose end points are 
represented by two continued fractions: 


+ ^ + 


and — , 1 

"+S + 


Vn X 



with positive integer incomplete quotients Vi, t; 2 , . . . Vn, while a; is a 
positive number ^1. Two such intervals corresponding to two different 
systems of integers Vi, Vj, . . . Vn and vj, vj, . . . do not overlap; 
that is, do not have common inner points. For, if they had a common 
inner point represented by an irrational number N (which we can always 
suppose), we should have for some positive x' < 1 and x" < 1 


W - i 1 

’' + ir.+ 


1 


’■ + .1 + 


Vn-hx' 


+ 


< + x'- 


But that is impossible unless v[ = vi, = V 2 , . . . vi = Vn. 

A number M being selected at random between 0 and 1 and converted 
into a continued fraction 


M = - , 1 

H— , 

V2 + 


t'fi + f 


if the quantity i turns out to be contained between 0 and x < 1, M must 
belong to one (and only one) of the intervals 5(t;i, V 2 , . . . Vn, x) cor¬ 
responding to one of all the possible systems of n positive integers 
vif V 2 t • • • Vn. Since M has a uniform distribution of probability and 
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since the length of the interval v*, . . . x) is 
1 1 


(- 1 )” 


Vi + 


V2 + 


Vi + 


V2 + 


+ 


Vn-\-X 

the required probability Pnix) will be expressed by the sum 

i - i 

1/2 + * - t ^2 + 




p.(x) = 


+ 


1 


Vn X 




extended over all systems of positive integers Vi, vj, . . . Vn. In general 
let 


^ = 1 1 
+ ST, + 


{i = 1, 2, ... n) 




be a convergent to the continued fraction 

Then the above expression for Pn{x) can be exhibited in a more convenient 
form: 


(1) 


p»(*) = 


2 


(- 1 ) 


Qn + xQn~l 


Qn 




VI,»S, . . . Vn 

By the very definition of Pn{x) we must have Pn(l) = 1; hence the 
important relation 


2q„(q.+ o,_,) " 

This result can also be established directly by resorting to the original 
expression of Pn(l) and performing summation first with respect"^to Vi, 
then with respect to V 2 , etc. 

Relation (2) can be interpreted as follows: Let 5 in general be the 
length of an interval b{v\y V 2 , . . . 1). Then 

= 1 

summation being extended over the (enumerable) set of intervals 5. 
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3. The Derivative of Pn{x). In attempting to show that Pn{x) 
tends uniformly to a limit function as n « it is easier to begin with its 
derivative Pnix). Series 


(Qn + icQn-l)* 

obtained by formal derivation of (1) is uniformly convergent in the 
interval (0, 1). For 


whence 


and the series 


' < 2 


2n + QniQn + Qn-l) 


^Qn{Qn -h Qn-l) 


2 


is convergent. Hence 

dPnjx) 

dx 

Since 


Pnix) - 2(Q„ + 


Qn = VnQn-l + Qn-2 


we have 


Pn(x) = 2 


.. 


* (v, + x)* 


and, performing summation with respect to Vi, V 2 , , . . Vn-i for constant 

Vn 


1 


^ / 1 Y 

. . . 1»„-iV Qn-l + -- T—Qn-2 J 

\ Vn -j- X / 




p„(x) = 


•n -1 


+ xr 


whence 
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or else 

w 

'3) p»(*) = + x)* 

• -1 

—an important recurrence relation which permits determining com¬ 
pletely the sequence of functions 

PiW, PiW, ... 

starting with po(x) ~ 1. 

4. Discussion of a More General Recurrence Relation. In discussing 
relation (3) the fact that poix) = 1 is of no consequence. We may start 
with any function /o(a;) subject to some natural limitations, and form a 
sequence 

fi(x),Mx),ft{z), . . . 
by means of the recurrence relation 

m 

(4) /.(.) - 2/-(^)5rTir'' 

I 

The following properties of /«(*) follow easily from this relation: 
a. If 


Mx) = 


1+x 


then 


For 


fnix) = n = 1, 2, 3, . . 


Mx) » + x + l) 1+x 

» —1 


whence the general statement follows immediately. 
b. If 


then 


r+i - - m 


1 + ® 


^ Ux) ^ 


1+x 
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Follows from (a) and equation (4) itself. 

As a corollary we have: Let M« and rrin be the precise upper and 
lower bounds of 

(1 + x)Ux) (n = 0, 1, 2, . . . ) 
in the interval 0 ^ a; ^ 1. Then 

Afo ^ Ml ^ ilf2 ^ . 

mo ^ mi ^ m2 ^ • • • 

c. We have 


- X ■ X"*’'^' 

d. The following relations can easily be established by mathematical 
induction: 


fM = 

U(.x) = 
fsnix^ “ 


2 /Pn + xPn-l\ 1 

^\Qn + xQn-l/(Qn + xQn-l)* 

2 ( Pn + xPn-l \ 1 

\Qn + xQn~.\/iQn + xQn-l)* 

2 ( Pn + xPn-.l \ 1 

*"\Qn + xQn-l/{Qn + xQn-l)^ 


Let US suppose now that the function /o(x) defined in the interval 

0 g X ^ 1 

possesses a derivative everywhere in this interval and let mo be an upper 
bound of |/J(x)| while M is an upper bound of 1(1 -f x)/o(x)|. Then by 
property (6) 

\fnix)\ ^ M; |/2n(x)l ^ M; l/,„(x)| ^ M,- 

The function /„(x) represented by the series 

/»(x) = +L2 «-i)* 

where u stands for 

P, + xP^i 

Q, + xQ^i 
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has a derivative; for the series obtained by a formal differentiation 

is uniformly convergent and represents/'(x). Now 

Qn-l 


and 


Hence 


_ <1 

(Q« + ^ Ql 

QniQit + Qn-l) 


Qi> 


i^Mu) 


Qn 


(Qn + 


< 4Af2 


Qn(0n + Qn-l) 


= 4M 


by virtue of (2). On the other hand, the inequality 

Qn(Qn + Qn-l) = (t^nQn-1 + Qn-2)[(t^n + l)Qn-l + Qn~2] > 

> 2Qn-l(Qn-l + Qn- 2 ) 

holding for n ^ 2 together with an evident inequality 

Qi(Qi + Qo) ^ 2 

shows that 

Qn(Qn + Qn-l) > 2« (n ^ 2). 

Thus 

1 . n2 Qn(Qn + Qn-l) . Qn(Qn + Qn-l) 


(Q. + xQn-l)^ >QI QI> 

and consequently 


> 2»-*Qn(Qn + Qn-l) 




(- 1 )" 


(Q, + 


Mo 

2*1^2* 


Hence, we may conclude that 


Mo 


Ml = ^ + 4M 


is an upper bound of l/n(x)l. Similarly, starting with the second equation 
in (d), we find that 
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is an upper bound of l/JnC^)!, and so forth. In general, the recurrence 
relation 


W = + 4M (fc = 1, 2, 3, • 

determines upper bounds of 

l/:WI, \fL(x)\, \fiM\, — 

It is easy to see that in general 

Mo . 4Af 


^ 1 - 

so that for sufficiently large n 

Hk < 

6. Main Inequalities. Let 


2-(n-i) 


<Po(^) = Mx) - 


mo 

1 + X* 


Then 


= ^ipo(u) 


(Q„ + xQn-i)* " 2 




Qn{Qn + Qn-l) 


Since the intervals 5 defined at the end of Sec. 2 do not overlap and cover 
completely the whole interval (0, 1), we may write: 

I = Mx)dx = l^j^Mx)dx = + 

the latter part following from the mean value theorem and Ui being a 
number contained within the interval 5. By subtraction we find 


/»(*) - 




1 + X ■ ^ + Q,_,) 

and, since both u and ui belong to the same interval 5, 


^o(u) - <p(ui) > - 


Mo + mo 

Qn(Qn + Qn-l) 


> - 


MO + mo 


Mx) - 


mo 

rr^ 


-1 > 


Mo + mo 
2n+l ^ 


Consequently, 
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and a fortiori 

# V. »*0 + i - 2 -»(ai() 4- »no) 

>-TTi- 

It follows that 

(5) mi ^ wio 4- ^ 2“"(^ + mo).* 

In a similar way, considering the function 

ux) = -/•(*) 

and setting 

ii = ifgMx)dx, 

we shall have 


/ /^\ ^ -^0 — + 2“**(/Llo 4" Mo) 

MX) < --, 

whence 

(6) Ml ^ Afo - ii 4- 2 ~"(mo 4- Mo). 

Further, from (5) and (6) 

Ml — mi ^ Mo — mo 4" 4" Mo) — I — fi. 

But 


I h = i log 2 • (Afo — mo) = (1 — k)(Mo — mo)] k < 0.66, 
so that finally 

Ml — mi < fc(Mo — iWo) 4" 2“"'*’*(/xo 4“ Mo). 

Starting with /n(x), Un{x)y . . . instead of /o(a;), in a similar way we find 

M 2 -m 2 < kiMi - mi) + + Mi) 

Mi — mj < k{M2 — m 2 ) 4" 4" M 2 ) 


Mn — mn < k{Mn-\ — mn-l) 4" 4“ Mn-l). 

From these inequalities it follows that 

M» ~ m» < (Mo - mo)Ai» 4- 2-+^ [M""' + 4- * • * + Mn-i 4- 

4- MoA;-' + Mi«»-* 4- • • ‘ + Mn-i]. 

Without losing anything in generality, we may suppose that /o(x) is a 
positive function. Then 

nti are used here with the same meaning as mni in Sec. 4. 
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Mu ^ Mo, HU < 5Mo (fc » 1, 2, 3, . . . ) 
at least for sufficiently large n. Owing to these inequalities we shall have 

(7) Mn — trin < (Mo — mo)fc" 4- 


This inequality shows that sequences 

Mo ^ Ml ^ M, ^ • • 

mo ^ wii ^ mi g • • • 


approach a common limit a. The following method can be used to find 
the value of this limit. Let ^ be an arbitrary sufficiently large integer 
and n the integer defined by 


Then 


and therefore 


n* ^ AT < (n + 1)*. 






Mn 
1 +X 


mn 

1 +X 


^ Mx) 




Mn 
i + x 


The last inequality permits presenting/;v(x) thus: 


(8) Mx) = + e(M„ - m,): |«| < 1, 

whence 

J^f/f{x)dx = j[Vo(x)dx = a log 2 -f d'(Mn ~ mn), |^'| < 1, 

and, because Mn — mn ultimately becomes as small as we please in 
absolute value, 

a log 2 ^ j^Mx)dx. 

Equation (8) shows clearly that the sequence of functions 

/o(x),/i(:r),/i(x), . . . 

defined by the recurrence relation (4) approaches uniformly the limit 
function 


a 


1 + X 
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where 


6. Solution of the Gaussian Problem. It suffices to apply the preced¬ 
ing considerations to the case fo(x) = po(x) =1. In this case Mo = 2, 
?no = 1, Mo = 0 and 

_ 1 
® log 2 

Consequently, 

P«(*) = (1+ a:) log 2 + (!-*:)• 2-*)’ 

where n = [\/N]. It suffices to integrate this expression between limits 
0 and t < 1 to find 


As A/" —> 00 


P-Ui - + <r-w2- )' ^ '■ 

log (1 + t) 




log 2 


as stated by Gauss. Moreover, 

P,(i) _ log + 0| < +_^_^ 

log 2 P V ^ (1 - ik)2"-V 

for sufficiently large, but finite N, 
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0.86 

0.21 
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0.0948 
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0.0987 
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0.26 

0.1026 
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0.27 

0.1064 

0.92 

0.28 

0.1103 

0.93 

0.29 

0.1141 

0.94 

0.30 

0.1179 

0.95 

0.31 

0.1217 

0.96 

0.32 

0.1266 

0.97 

0.33 

0.1293 

0.98 

0.34 

0.1331 

0.99 

0.36 

0.1368 

1.00 

i 0.36 

0.1406 

1.01 

0.37 

0.1443 

1.02 

0.38 

0.1480 

1.03 

0.39 

0.1617 

1.04 

0.40 

0.1664 

1.06 

0.41 

0.1591 

1.06 

0.42 

0.1628 

1.07 

0 43 

0.1664 

1.08 

0.44 

0.1700 

1.09 

0.46 

0.1736 

1.10 

0.46 

0.1772 

1.11 

0.47 

0.1808 

1.12 

0.48 

0.1844 

1.13 

0.49 

0.1879 

1.14 

0.60 

0.1916 

1.16 

0.61 

0.1960 

1.16 

0.62 

0.1986 

1.17 

0.63 

0.2019 

1.18 

0.64 

0.2064 

1.19 

0.66 

0.2088 

1.20 

0.66 

0.2123 

1.21 

0.67 

0.2167 

1.22 

0.68 

0.2190 

1.23 

0.69 

0.2224 

1.24 

0.60 

0.2267 

1.25 

0.61 

0.2291 

1.26 

0.62 

0.2324 

1.27 

0.63 

0.2367 

1.28 

0.64 

0.2389 

1.29 
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Azrangements, 18 

B 

Bayes’ formula (theorem), 61 
Bernoulli criterion, 6 
Bernoulli theorem, 96 
Bernoulli trials, 45 
Bernstein, S., inequality, 205 
Bertrand’s paradox. 251 
Buffon’s needle problem, 113, 251 
Barbier’s solution of, 253 

C 

Cantelli’s theorem, 101 
Cauchy’s distribution, 243, 275 
Characteristic function, composition of, 
275 

of distribution, 240, 264 
Coefficient, correlation, 339 
divergence, 212, 214, 216 
Combinations, 18 

Compound probability, theorem of, 31 
Continued fractions, 358, 361, 396 
Markoff’s method of, 52 
Continuous variables, 235 
Correlation, normal (see Normal cor¬ 
relation) 

Correlation coefficient, distribution of, 
339 

D 

Difference equations, ordinary, 75, 78 
partial, 84 

Dispersion, dehnition, 172 
of sums, 173 

Distribution, Cauchy's, 243, 275 
characteristic function of, 264 
of correlation coefficient, 339 


Distribution, determination of, 271 
equivalent point, 369 
general concept of, 263 
normal (Gaussian), 243 
Poisson’s, 279 
‘‘Student’s,” 339 

Distribution function of probability, 
239, 263 

Divergence coefficient, empirical, 212 
Lexis’ case, 214 
Poisson’s case, 214 
theoretical, 212 
Tschuprow’s theorem, 216 

E 

Elementary errors, hypothesis of, 296 
Ellipses of equal probability, 311, 328 
Estimation of error term, 295 
Euler’s summation formula, 177, 201, 
303, 347 

Events, compound, 29 
contingent, 3 
dependent, 33 
equally hkely, 4, 5, 7 
exhaustive, 6 
future, 65 
incompatible, 37 
independent, 32, 33 
mutually exclusive, 6, 27 
opposite, 29 

Expectation, mathematical, 161 
of a product, 171 
of a sum, 165 

F 

Factorials, 349 
Fourier theorem, 241 
French lottery, 19, 108 
Frequency, 96 

Fundamental lemma (see Limit theorem) 
Fundamental theorem (see Tshebysheff- 
Markofif theorem) 
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G 

Gaussian distribution, 243 
Gaussian problem, 396 
Generating function of probabilities, 
47. 78, 85, 89, 93, 94 

H 

Hermite polynomials, 72 
Hypothesis of elementary errors, 296 

I 

Independence, dehnition of, 32, 33 
K 

Khintchine (see Law of large numbers) 
Kolmogoroff (see Law of large numbers; 
Strong law of large numbers) 

L 

Lagrange’s series, 84, 150 
Laplace-Liapounoff (see Limit theorem) 
Laplace’s problem, 255 
Laurent’s series, 87, 148 
Law of large numbers, generalization 
by Markoff, 191 1 ' 

for identical variables (Khintchine), 
195 y 

Kolmogoroff’s lentma, 201 
theorem, 185 \/ 

Tshebysheff’s lemma, 182 
Law of repeated logarithm, 204 
Law of succession, 69 
Lexis’ case, 214 

Liapounoff condition (see Limit theorem! 
Liapounoff inequality, 265 
Limit theorem, BernouUian case, 131 
for sums of independent vectors, 318, 
323, 325, 326 
fundamental lemma, 284 
Laplace-Liapounoff, 284 
Line of regression, 314 
Lottery, French (see French lottery) 

M 

Marbe’s problem, 231 
Markoff’s theorem, infinite dispersion, 
191 


Markoff’s theorem, for simple chains, 301 
Markoff-Tshebysheff theorem (see 
Tshebysheff-Markoff theorem) 
Mathematical expectation, definition of, 
161 

of a product, 171 
of a sum, 165 

Mathematical probability, definition of, 
6 

Moments, absolute, 240, 264 
inequalities for, 264 
method of (Markoff’s), 356if. 

N 

Normal correlation, 313 
origin of, 327 

Normal distribution, Gaussian, 243 
two-dimensional, 308 

P 

Pearson’s “x*'tcst,” 327 
Permutations, 18 
Point, of continuity, 261, 356 
of increase, 262, 356 
Poisson series, 182, 293 
Poisson’s case, 214 
Poisson’s distribution, 279 
Poisson’s formula, 137 
Poisson’s theorem, 208, 294 
Polynomials, Hermite (see Hermite) 
Probability, approximate evaluation of, 
by Markoff’s method, 52 
compound, 29, 31 
conditional, 33 
definition (classical) of, 6 
total, 27, 28 

Probability integral, 128 
table of, 407 

R 

Relative frequency, 96 
Runs, problem of, 77 

S 

Simple chains, 74, 223, 297 
Markoff’s theorem for, 301 
Standard deviation, 173 
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Stieltjes^ integrals, 261 
Stirling's formula, 349 
Stochastic variables, 161 
Strong law of large numbers (Kolmo- 
goroff), 202 

“Student’s” distribution, 339 
T 

Table of probability integral, 407 
Tests of significance, 331 
Total probability, theorem of, 27, 28 
Trials, dependent, independent, repeated, 
44, 45 


Tschuprow {see Divergence coefficient) 
Tshebysheff-Markoff theorem, funda¬ 
mental, 304, 384 
application, 388 
Tshebysheff’s inequalities, 373 
Tshebysheff’s inequality, 204 
Tshebysheff’s lemma, 182 
Tshebysheff’s problem, 199 

V 

Variables, continuous, 236 
independent, 171 
stochastic, 161 
Vectors {see Limit theorem) 





